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ABSTRACT 


Software-defined  network  (SDN)  orchestration,  the  problem  of  integrating  and  deploy¬ 
ing  multiple  network  control  functions  (NCFs)  while  minimizing  suboptimal  network 
states  that  can  result  from  competing  NCF  proposals,  is  a  challenging  open  prob¬ 
lem.  In  this  work,  we  formulate  SDN  orchestration  as  a  multiobjective  optimization 
problem,  present  an  evolutionary  algorithm  designed  to  explore  the  NCF  tradeoff 
space  comprehensively  and  avoid  local  optima,  and  propose  a  new  application-aware 
approach  that  explicitly  models  resource  preferences  of  individual  application  work¬ 
loads.  Further,  we  propose  a  new  logical  application  workload  (LAW)  abstraction  to 
enable  precomputation  of  the  required  relative  positioning  of  an  application’s  virtual 
machines  (VMs)  and  allocation  of  these  VMs  in  a  single  atomic  step,  leading  to  online 
algorithms  that  are  one  order  of  magnitude  faster  than  existing  solutions  for  placing 
data  center  workloads.  For  an  instance  of  the  SDN  orchestration  problem  subject  to 
four  independent  NCFs  attempting  to  optimize  network  survivability,  bandwidth  ef¬ 
ficiency,  power  conservation,  and  computational  contention,  we  demonstrate  that  our 
approach  enumerates  a  wider  range  of,  and  potentially  better,  solutions  than  current 
orchestrators,  for  data  centers  with  hundreds  of  switches,  thousands  of  servers,  and 
tens  of  thousands  of  VM  slots. 
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CHAPTER  1: 

Introduction 


Cloud  computing  [1],  the  practice  of  using  warehouse-scale  networks  of  computer 
servers  called  data  centers  to  remotely  store,  manage,  and  process  user  data,  is  be¬ 
coming  increasingly  popular  as  a  cost-effective  and  on-demand  solution  for  both  cor¬ 
porate  (e.g.,  Google,  Amazon,  Microsoft)  and  government  (e.g.,  Defense  Information 
Systems  Agency)  service  providers  to  provide  mission-critical  applications  such  as 
email,  video  teleconferencing,  and  mission  planning,  to  cloud  tenants.  A  cloud  ten¬ 
ant,  or  simply  “tenant,”  is  a  consumer  of  a  provided  cloud  service.  Such  consumers 
may  include  government  agencies,  corporate  enterprises,  or  even  individual  users. 
The  relationship  between  cloud  provider  and  tenant  is  typically  defined  in  terms  of 
special  contract  called  a  service- level  agreement  (SLA),  which  defines  the  amount  of 
physical  data  center  resources  (e.g.,  CPU,  memory,  storage,  link  bandwidth,  etc.)  to 
be  allocated  to  some  tenant  application  requirement,  or  “workload”.  A  workload  is 
typically  represented  by  the  number  and  size  of  application  components  (i.e.,  virtual 
machines)  required  to  support  the  tenant  application.  It  is  the  task  of  the  data  cen¬ 
ter  operator  to  allocate  these  application  workload  components  to  resources  of  the 
physical  infrastructure  that  can  support  them  (e.g.,  physical  host  servers,  switches, 
middleboxes).  Quality-of-service  (QoS)  requirements,  such  as  latency,  response  time, 
throughput,  and  application  availability  required  by  the  tenant  workload  may  also  be 
stated  in  the  SLA.  Hence,  although  cloud  tenants  are  typically  only  concerned  with  a 
single  objective,  namely  the  cloud  provider’s  ability  fulfill  their  stated  SLA,  the  goals 
of  the  data  center  operators  providing  cloud  services  are  more  complex:  their  aim  is 
to  fulfill  the  SLA  of  all  tenants  while  jointly  achieving  a  multitude  of  “operator  ob¬ 
jectives”.  Such  objectives  include  fault  tolerance,  typically  measured  by  the  fraction 
of  application  components  (e.g.,  virtual  machines)  that  survive  a  single  worst  case 
physical  component  (e.g.,  network  device)  failure;  network  communication  efficiency, 
which  may  be  measured  by  the  average  (or  maximum)  amount  of  bandwidth  allocated 
(or  latency  incurred)  over  a  set  of  network  links,  and  power  conservation,  measured 
by  the  amount  electrical  power  used. 


1 


A  recent  Cisco  report  [2]  provides  clear  evidence  that  “enterprise  and  government 
organizations  are  moving  from  test  environments  to  placing  more  of  their  mission- 
critical  application  workloads  into  the  cloud.”  According  to  this  report,  the  number 
of  application  workloads  processed  by  cloud  data  centers  and  the  total  amount  of 
network  traffic  generated  by  these  workloads  are  each  expected  to  increase  by  over 
300%  over  the  next  five  years.  This  unprecedented  demand  for  cloud  services  has  led 
to  the  development  of  complex  large-scale  data  centers  to  host  them  while  creating  a 
multitude  of  new  challenges  for  the  data  center  operators  charged  with  orchestrating 
their  deployment. 

First,  certain  operator  objectives  like  fault  tolerance,  network  communication  effi¬ 
ciency,  and  power  conservation  may  be  in  conflict  with  each  other,  and  hence  require 
tradeoff  considerations.  Second,  different  types  of  applications  may  bottleneck  dif¬ 
ferent  data  center  resources.  For  instance,  placing  components  of  computationally 
intensive  workloads  on  the  same  physical  host  server  may  cause  a  throughput  bot¬ 
tleneck  from  CPU  overcontention,  whereas  data  intensive  workload  components  may 
cause  a  network  bottleneck  if  spread  across  several  different  physical  hosts  due  to  the 
high  data  throughput  required  of  such  tasks.  Thus,  different  types  of  applications 
require  different  resource  allocation  considerations.  Third,  the  data  center  networks 
that  serve  as  the  backbone  of  cloud  service  providers  are  large,  on  the  order  of  hun¬ 
dreds  of  switches,  thousands  of  servers,  and  tens  of  thousands  of  virtual  machines 
(VMs),  or  larger  [2];  thus,  for  automated  orchestration  solutions  to  be  effective,  they 
must  be  scalable  to  large  network  topologies  (e.g.,  thousands  of  devices)  at  reasonable 
time  scales  (e.g.,  order  of  seconds). 

The  key  question  that  this  dissertation  answers  is  how  to  orchestrate  the  allocation 
of  tenant  workloads  in  data  center  networks,  in  a  scalable  way,  to  comprehensively 
explore  the  tradeoff  space  between  competing  operator  objectives  while  minimizing 
resource  contention  among  individual  applications. 
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1.1  Orchestration  Challenges  in  Data  Centers 


Software-defined  networking  (SDN)  technology  provides  an  open  platform  for  de¬ 
veloping  specialized  network  control  functions  (NCFs)  to  achieve  separate  operator 
goals,  such  as  bandwidth  efficiency  [3,4],  fault  tolerance  optimization  [5],  power  con¬ 
servation  [6],  quality-of-service  (QoS)  control  [7],  and  security  enforcement  [8].  In 
this  work,  our  model  for  network  orchestration  is  similar  to  Statesman  [9],  Coryban- 
tic  [10],  and  Athens  [11],  where  each  NCF  is  a  specialized  SDN  control  program  that 
proposes  changes  to  the  current  network  state  for  consideration  by  an  orchestration 
program  in  order  to  accomplish  some  specific  objective.  The  orchestration  program  is 
charged  with  evaluating  the  proposals  of  each  NCF,  and  subsequently  implementing 
any  changes  to  the  physical  network.  Proposed  changes  made  by  NCFs  may  include 
allocating  or  migrating  tenant  VMs  to  host  servers,  adding  or  removing  forwarding 
rules  from  routing  tables,  updating  access  control  lists  on  network  devices,  powering 
on  or  off  devices  or  device  components,  etc.  Clearly,  two  separate  NCFs  may  make 
conflicting  proposals.  For  instance,  one  NCF  may  want  to  use  a  specific  resource  (e.g., 
server,  switch,  network  link),  while  another  may  want  to  power  it  off,  as  illustrated 
in  Figures  1.1  and  1.2.  Furthermore,  we  assume  that  each  NCF  has  a  corresponding 
utility  or  objective  function  that  its  proposed  changes  seek  to  maximize. 

Ideally,  data  center  operators  seek  to  achieve  network  states  that  jointly  optimize 
their  objectives  given  customer  (i.e.,  tenant)  requirements.  Conflicting  objectives, 
diverse  application  workloads,  varying  application  demand,  and  the  size  of  data  center 
networks  make  achieving  ideal  network  orchestration  challenging,  especially  within 
large-scale  multi-tenant  data  centers. 

1.1.1  Objectives  May  Conflict 

Data  center  orchestration  is  inherently  multi-objective  in  nature.  Even  if  a  data 
center  operator  is  primarily  concerned  with  a  single  objective  such  as  minimizing  link 
congestion,  there  are  natural  tradeoffs  in  other  objectives,  such  as  power  conservation, 
that  must  be  accepted  for  such  optimization. 

To  illustrate  such  tradeoffs,  consider  two  hypothetical  NCF  proposals  for  a  fat-tree 
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NCF-B  NCF-P 


Figure  1.1:  NCF-B  proposes  to  reduce  link  congestion  by  load  balancing  traffic  across 
all  network  links,  whereas  NCF-P  proposes  to  save  power  by  concentrating  traffic  over 
a  small  subset  of  links. 


data  center  architecture  [12],  as  depicted  in  Figure  1.1.  The  fat-tree  architecture  is 
a  physical  network  topology  commonly  used  in  data  networks  representing  a  hier¬ 
archical  multi- rooted  tree  consisting  of  four  levels:  core  switches  (root,  level  =  0), 
aggregate  switches  (level  =  1),  edge  or  top-of-rack  (ToR)  switches  (level  =  2),  and 
host  servers  (leaves,  level  =  3),  with  redundant  links  connecting  devices  at  each  level. 
A  data  center  operator  wishing  to  minimize  link  congestion  may  choose  to  implement 
NCF-B’s  proposal  to  load  balance  traffic  over  the  maximum  number  of  links,  but 
in  doing  so  achieves  a  network  state  that  may  be  suboptimal  when  the  additional 
objective  of  power  conservation  is  considered.  Conversely,  if  a  data  center  opera¬ 
tor  is  solely  concerned  with  power  conservation,  then  he/she  may  opt  to  implement 
NCF-P ’s  proposal,  which  concentrates  traffic  over  a  minimum  subset  of  network  links 
and  switches,  allowing  unused  resources  to  be  powered  down.  But  this  choice  comes 
with  the  tradeoff  of  higher  link  congestion.  Clearly,  a  comprehensive  approach  for 
exploring  the  tradeoff  space  of  multiple  NCF  proposals  is  desirable. 

1.1.2  Application  Workloads  Are  Diverse 

Application  workloads  hosted  by  data  center  networks  stress  a  range  of  different  re¬ 
sources.  Generally,  we  observe  that  these  workloads  are  either  1)  CPU  or  memory  in¬ 
tensive  workloads  (e.g.,  meteorological,  geological,  and  particle  physics  simulations,  or 
other  high  performance  computing  tasks),  commonly  referred  to  as  compute- intensive 
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(Cl)  workloads,  2)  network  or  storage  intensive  workloads  (e.g.,  client-server  appli¬ 
cations  or  Hadoop  and  other  big  data  applications),  commonly  referred  to  as  data- 
intensive  (DI)  workloads,  or  3)  both  (e.g.,  large  big  data  processing  tasks).  Recent 
research  efforts  [13, 14]  confirm  our  intuition  that  Cl  workloads  perform  better  when 
their  subcomponent  processes  and  VMs  are  spread  across  separate  physical  CPU  cores 
and  host  servers,  respectively,  while  DI  workloads  perform  better  when  the  VMs  (or 
processes)  are  placed  on  the  same  host  server  (or  CPU  core).  For  example,  by  placing 
all  VMs  of  a  client-server  (DI)  application  on  the  same  physical  host,  tenfold  increases 
in  throughput  have  been  observed  [13].  In  contrast,  over-contention  of  CPU  resources 
by  VMs  of  Cl  applications  have  been  shown  to  increase  job  completion  time  by  as 
much  as  260%  [13]. 

Therefore,  we  argue  that  an  ideal  data  center  resource  allocation  strategy  should 
consider  the  characteristics  of  individual  application  workloads  in  addition  to  the 
tradeoffs  between  the  various  competing  global  objectives  considered  by  the  data 
center  operator. 

1.1.3  Data  Center  Networks  Are  Large 

The  devices  composing  a  data  center  network  is  typically  in  the  order  of  thousands  [5] , 
making  the  manual  management  of  such  a  network  tedious  and  prone  to  error  and 
inefficiency.  Software-defined  networking  (SDN)  technology  offers  network  adminis¬ 
trators  the  promise  of  convenient,  efficient,  and  accurate  network  management  by 
enabling  the  development  and  deployment  of  automated  NCFs,  i.e.,  SDN  control 
programs,  that  automatically  perform  some  set  of  resource  allocation  or  network  con¬ 
figuration  actions  to  achieve  operator  objectives.  However,  even  automated  network 
management  and  orchestration  strategies  that  leverage  SDN  technology  must  be  scal¬ 
able  to  handle  the  large  application  workloads  and  physical  topologies  characteristic 
of  today’s  data  centers. 
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1.2  Current  Data  Center  Orchestration  Approaches 


Prior  work  in  data  center  orchestration  can  be  distinctly  categorized  as  either  1) 
NCF  synchronization  approaches  [9],  or  2)  NCF  resource  allocation  approaches  [10, 
11].  Synchronization  approaches  like  Statesman  [9]  view  the  underlying  network  as 
a  shared  resource  contested  for  by  several  NCFs,  and  seek  to  arbitrate  control  of 
devices  (e.g.,  switches,  servers)  or  device-components  (e.g.,  switchports,  CPU  cores) 
by  detecting  and  resolving  conflict  among  NCF  proposals,  whereas  existing  resource 
allocation  approaches  like  Corybantic  [10]  and  Athens  [11]  evaluate  the  utility  of 
various  NCF  proposals  in  terms  of  the  allocation  of  physical  network  resources  (e.g., 
hosts,  switches,  network  links,  middleboxes)  to  support  tenant  requirements  (e.g., 
VMs,  application  SLA,  flow  rules,  network  services)  in  order  to  maximize  the  utility 
afforded  to  the  network  operator.  So  while  NCF  synchronization  approaches  detect 
and  resolve  conflict  among  conflicting  NCF  proposals,  and  perhaps  allow  multiple  non¬ 
conflicting  NCF  proposals  to  be  implemented  simultaneously,  NCF  resource  allocation 
approaches  assume  that  NCF  proposals  conflict,  and  aim  to  select  the  proposal  that 
best  achieves  the  objectives  of  the  operator. 

1.2.1  NCF  Synchronization  Approaches 

Synchronization  approaches  to  data  center  orchestration,  and  network  orchestration 
more  generally,  essentially  seek  to  detect  and  resolve  inter-NCF  conflict  (Figure  1.2) 
at  a  low  level  by  ensuring  mutual  exclusion  of  device  or  device-component  control 
among  competing  NCF  proposals. 

In  the  Figure  1.2  conflict  scenario,  NCF-B  wants  to  load  balance  traffic  to  minimize 
link  congestion,  while  NCF-P  wants  to  power  down  one  of  these  switches  to  conserve 
power.  Clearly,  shutting  down  some  switch  forwarding  an  active  flow  may  result 
in  traffic  loss.  If  the  orchestration  program  allows  both  NCFs  to  control  the  same 
resource  by  attempting  to  implement  both  proposals  simultaneously  or  by  cycling 
between  them,  then  network  instability  (oscillation)  may  result  [15],  e.g.,  the  NCFs 
alternate  by  cycling  the  power  on  the  resource. 

Synchronization  approaches  to  network  orchestration  have  several  benefits,  but  a  few 
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Current  proposal  from 
NCF-B  allocates  traffic  on 
path  through  Switch  B. 

Counter-proposal  from 
NCF-P  proposes  to  power 
down  Switch  B  to  conserve 
energy. 


Traffic  lost  at  B, 
if  NCF-P  proposal 
implemented 
while  flow  at  B  is 
still  active 


Figure  1.2:  An  example  of  inter-NCF  conflict.  After  implementing  a  proposal  from 
NCF-B  (“balance”)  to  reduce  link  congestion  by  load  balancing  an  active  flow  from 
Switch  A  to  Switch  C  via  Switch  B  (perhaps  to  avoid  congestion  at  Switch  D),  NCF-P 
(“power”)  makes  a  counter-proposal  to  conserve  power  by  proposing  to  power  down 
Switch  B.  If  NCF-P’s  proposal  is  implemented  while  flows  through  Switch  B  are  still 
active,  a  loss  of  traffic  will  occur  at  Switch  B. 


significant  drawbacks.  One  benefit  is  the  speed  and  scalability  by  which  such  an 
approach  may  be  implemented.  Statesman  effectively  detects  and  resolves  conflict 
among  NCF  proposals,  and  further  implements  a  policy  checker  to  ensure  that  the 
resultant  network  state  complies  with  high-level  network  policy,  all  within  the  order 
of  seconds  for  large-scale  data  centers  containing  thousands  of  devices.  Synchro¬ 
nization  solutions  are  also  well  suited  to  implement  proposals  from  multiple  NCFs 
simultaneously,  as  long  as  the  proposals  are  non-conflicting. 

Real-time  NCF  Synchronization.  NCF  synchronization  solutions  equipped  to 
orchestrate  real-time  NCFs  (RT-NCFs),  or  NCFs  that  operate  with  hard  time  con¬ 
straints  (e.g.,  at  line  speed),  comprise  a  subset  of  NCF  synchronization  that  is  gaining 
increasing  attention  from  the  research  community.  One  recent  work  by  Volpano  et 
al.  [15]  offers  a  unique  approach  in  addressing  the  SDN  orchestration  problem  for 
RT-NCFs  by  representing  NCFs  as  deterministic  finite  transducers  (DFTs).  Where 
Statesman,  Corybantic,  and  Athens  may  be  considered  “black  box”  or  “grey  box”  ap¬ 
proaches,  the  work  by  Volpano  et  al.  employs  a  distinctive  “white  box”  approach,  by 
exposing  internal  NCF  control  logic  as  DFTs.  Unlike  prior  work  in  SDN  orchestration 
that  uses  an  orchestration  program  to  act  as  shim  layer  between  NCF  proposals  and 
the  physical  network  infrastructure,  the  DFT  approach  used  in  [15]  enables  NCFs  to 
impart  their  proposals  directly  on  the  physical  devices  and  device  components  of  the 


7 


infrastructure  in  a  way  that  is  free  of  both  conflict  and  oscillation.  This  approach 
performs  direct  low-level  NCF  orchestration  without  the  need  to  consult  an  external 
orchestration  program,  making  it  suitable  for  the  orchestration  of  RT-NCFs. 

However,  none  of  these  synchronization  approaches  consider  the  exploration  of  trade¬ 
offs  with  respect  to  the  utility  (e.g.,  operator-defined  objective  metrics)  offered  by  a 
range  different  NCF  proposals.  Thus,  we  view  these  approaches  as  largely  orthog¬ 
onal  to  our  work,  as  we  are  primarily  interested  in  exploring  the  utility  of  various 
feasible  NCF  proposals  within  the  tradeoff  space  with  respect  to  competing  operator 
objectives. 

1.2.2  NCF  Resource  Allocation  Approaches 

NCF  resource  allocation  approaches  evaluate  competing  NCF  proposals  in  terms  of 
the  joint  utility  offered  to  the  network  operator.  In  contrast  to  NCF  synchronization 
approaches,  which  may  implement  multiple  non-conflicting  proposals  simultaneously, 
NCF  resource  allocation  approaches  rank  each  proposal  in  terms  of  its  utility  offered, 
and  strive  to  discover  the  one  that  offers  the  best  value  to  the  operator.  Current 
resource  allocation  solutions  [3,4,10,11,16]  overwhelmingly  attempt  to  reduce  the 
multi-objective  nature  of  orchestration  to  a  single-objective  problem  (SOP),  either 
by  optimizing  a  single  objective  function  subject  to  the  others  cast  as  constraints,  as 
done  in  [3,4];  or  by  combining  multiple  objectives  into  a  single  global  cost  function, 
as  done  in  [10,11,16]. 


Single  Objective  Optimization  Approaches 

Pure  SOP  optimization  approaches  such  as  [3,  4]  are  limiting  in  that  only  a  single 
objective  metric,  such  as  bandwidth  allocation,  is  optimized;  all  other  objectives 
(e.g.,  fault  tolerance,  communication  latency,  power  consumption,  etc.)  are  cast  as 
constraints.  Although  using  such  approaches  permit  the  use  of  specialized  heuristics 
to  quickly  and  scalably  achieve  resource  allocations  that  optimize  the  selected  metric, 
they  are  limiting  due  to  the  fact  that  only  a  single  metric  is  considered. 


Global  Cost  Minimization  Approaches  (MOP-to-SOP  Reduction) 

Global  cost  minimization  approaches  like  [10, 11, 16]  recognize  the  inherently  multi¬ 
objective  nature  of  data  center  orchestration,  but  instead  of  using  a  pure  multi¬ 
objective  optimization  problem  (MOP)  formulation  for  orchestrating  multiple  NCFs, 
they  instead  attempt  to  combine  the  individual  NCF  utility  metrics  representing 
distinct  operator  objectives,  into  a  single,  consolidated  global  objective  metric,  that 
is  intended  to  represent  the  net  “cost”  of  implementing  a  proposal  to  the  data  center 
operator.  The  problem  with  these  types  of  approaches  mainly  concerns  the  difficulty 
in  effective  implementation.  Because  all  of  the  NCF  utility  metrics  are  combined 
into  a  single  global  cost  function,  some  method  of  NCF  weighting  must  be  used  to 
ensure  that  the  proposal  selected  (i.e.,  the  discovered  proposal  that  yields  the  lowest 
value  of  the  global  cost  function)  actually  achieves  the  goals  of  the  operator.  But 
the  operator  may  not  have  a  particular  set  of  NCF  weights  a  priori,  and  it  may  take 
several  successive  runs  using  different  weightings  in  order  to  find  weights  that  produce 
a  desirable  proposal.  Furthermore,  certain  objectives,  such  as  bandwidth  allocation 
and  fault  tolerance,  may  be  disparate  or  orthogonal  with  respect  to  one  another. 
Thus,  it  may  not  be  possible  to  combine  such  objectives  into  a  single  global 
objective  function  in  a  way  that  preserves  operator  intent. 

1.2.3  Summary 

Because  we  are  primarily  interested  in  comprehensively  exploring  the  utility  of  various 
feasible  proposals  within  the  tradeoff  space  with  respect  to  competing  NCFs,  we 
choose  to  focus  on  the  formulation  of  the  data  center  orchestration  problem  as  a 
NCF  resource  allocation  problem.  Hence,  our  work  is  motivated  by  existing  resource 
allocation  approaches,  which  largely  attempt  to  optimize  SOP  formulations  of  the 
orchestration  problem.  However,  although  SOP  formulations  of  the  orchestration 
problem  permit  faster  solutions,  solving  a  SOP  yields  only  a  single  solution  within 
a  potentially  vast  tradeoff  space.  Furthermore,  many  current  approaches  use  search 
algorithms  based  on  greedy  heuristics  [4,10,11],  which  may  prematurely  converge  to 
suboptimal  local  maxima  when  applied  to  non-convex  optimization  problems. 

Thus,  we  believe  it  prudent  to  explore  an  alternative  formulation  based  on  the  classical 
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multi-objective  optimization  problem  (MOP)  literature  [17-19],  where  the  goal  is  to 
enumerate  a  diverse  set  of  Pareto-optimal  solutions  [19]  among  competing  NCFs,  i.e., 
no  solution  can  be  improved  in  any  objective  without  causing  a  degradation  in  at 
least  one  other  objective. 

1.3  Emerging  Trends  For  Rethinking  Data  Center 
Orchestration 

Most  of  the  limitations  with  current  techniques  for  data  center  orchestration  stem 
from  assumptions  made  about  the  types  of  application  workloads  to  be  hosted  or  the 
specific  objective  metrics  to  be  optimized.  As  a  result,  the  large  body  of  work  related 
to  orchestration  and  resource  allocation  in  data  center  networks  mainly  consists  of 
disparate  specialized  solutions  for  specific  problems  rather  than  a  cohesive  body  of 
general  solutions  applicable  to  a  wide  range  of  problems.  However,  due  to  the  over¬ 
whelming  industry  migration  from  the  smaller,  traditional,  task-specific  data  centers 
of  the  past,  to  the  larger,  cloud-based,  multi-tenant  data  centers  of  the  future,  data 
center  operators  are  hosting  an  increasingly  wide  range  of  diverse  applications  with 
different  requirements  and  optimization  preferences. 

There  have  been  several  recent  directions  arising  from  the  research  community  to 
address  these  new  challenges:  multi-objective  orchestration,  application-aware  or¬ 
chestration,  and  scalable  orchestration. 

1.3.1  Multi- Objective  Orchestration 

Corybantic  [10],  Athens  [11],  and  NFC  [16]  are  recent  data  center  orchestration  pro¬ 
posals  that  illustrate  the  continuing  trend  towards  orchestration  solutions  that  at¬ 
tempt  to  jointly  optimize  multiple  objectives. 

In  Corybantic  [10],  the  authors  view  SDN  orchestration  as  a  multi-objective  problem, 
and  attempt  to  reduce  it  to  a  SOP  transforming  the  performance  criteria  of  each 
NCF,  or  module  as  called  by  the  authors,  into  a  common  representative  currency 
interpretable  by  the  orchestrator.  Corybantic  proceeds  by  soliciting  each  NCF  for  its 
proposed  network  state,  evaluating  the  cost  each  proposal  in  terms  of  the  common 
currency,  and  finally  selecting  the  most  cost-effective  proposal  for  implementation. 
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Unfortunately,  the  Corybantic  approach  suffers  from  three  significant  pitfalls:  1) 
Expressing  the  performance  criteria  of  disparate  NCFs  in  terms  of  a  singular,  universal 
metric  may  not  be  feasible  clue  to  disparate  objectives.  2)  The  range  and  diversity 
of  candidate  network  states  are  limited  to  proposals  made  by  specialized  NCFs,  and 
hence  unlikely  to  achieve  mutually  beneficial  inter-NCF  compromises.  Hence  a  large 
portion  of  the  state  space  may  be  unexplored.  3)  Corybantic  uses  a  iterative  selection 
process  based  upon  a  single  greedy  criterion,  i.e.  it  selects  the  NCF  proposal  that 
yields  the  lowest  “common  currency,”  a  single  global  metric  that  each  NCF  utility 
metric  is  reduced  to  via  an  operator-specified  weighting.  Hence,  it  is  essentially  a  hill 
climbing  approach,  and  thus  may  converge  suboptimally  towards  a  local  minima. 

Another  related  work  that  uses  a  multi-objective  approach  towards  orchestration,  is 
Athens  [11].  Athens  builds  on  Corybantic  by  suggesting  a  family  of  voting  procedures 
available  to  each  NCF,  better  enabling  them  to  reconcile  disparate  objectives  by 
soliciting  for  each  other's  feedback  in  terms  of  “votes”.  So  while  Athens 
partially  addresses  the  first  of  the  Corybantic  pitfalls  described  above,  it  is  still 
subject  to  pitfalls  2)  and  3)  of  Corybantic  as  described  in  the  previous  paragraph. 

Finally,  the  authors  of  the  recent  Network  Function  Center  work  [16],  while  using  a 
similar  MOP  to  SOP  reduction  as  Corybantic  and  Athens  in  their  problem  formula¬ 
tion,  recognize  the  pitfalls  of  solely  using  greedy  NCF-specihc  heuristics  to  perform 
non-convex  SOP  optimization,  and  choose  to  use  genetic  algorithms  to  help  overcome 
the  issue  of  premature  algorithm  convergence  to  local  optima.  However,  although 
the  Network  Function  Center  provides  better  solutions  to  non-convex  SOPs,  it  is  still 
limiting  in  its  formulation  as  a  SOP  for  reasons  described  previously.  It  is  also  un¬ 
clear  how  well  the  Network  Function  Center  approach  enumerates  the  tradeoff  space 
between  multiple  objectives,  as  all  SOP  component  metrics  and  weightings  are  deter¬ 
mined  a  priori  and  only  a  single  resultant  network  state  is  produced  as  a  solution. 

In  summary,  the  chronological  evolution  of  related  work  in  data  center  orchestra¬ 
tion  from  Corybantic  (2013)  to  Athens  (2014)  to  Network  Function  Center  (2015), 
represents  a  strong  research  direction  towards  multi-objective  formulations  for  the 
orchestration  problem  and  better  approaches  for  finding  good  s  olutions.  However, 
what’s  still  missing  from  the  state-of-the-art  is  a  pure  MOP  formulation  of  the  or- 
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chestration  problem  and  a  corresponding  approach  to  comprehensively  explore  the 
tradeoffs  between  multiple  NCFs,  as  proposed  in  this  dissertation. 

1.3.2  Application- Aware  Orchestration 

As  opposed  to  focusing  solely  on  optimizing  the  objectives  of  data  center  operators, 
such  as  cumulative  resource  usage,  there  is  an  increasing  trend  in  the  research  com¬ 
munity  to  leverage  additional  opportunities  to  intelligently  orchestrate  application 
workloads  to  improve  per  application  performance  of  all  hosted  applications.  Specifi¬ 
cally,  recent  studies  [13,14]  demonstrate  that  different  types  of  workloads  contend  for 
different  types  of  resources.  Consequently  how  VMs  of  an  application  are  relatively 
positioned  can  significantly  impact  the  performance  of  an  application.  For  example, 
it  would  be  advisable  not  to  co-locate  VMs  of  computation  intensive  workloads  to 
avoid  unnecessary  CPU  contention  while  at  the  same  time,  position  VMs  of  the  same 
data  intensive  workload  as  close  as  possible  to  reduce  both  bandwidth  contention  and 
communication  latency  [14]. 

But  although  the  concept  of  application-aware  orchestration  is  not  new,  existing  so¬ 
lutions  that  claim  to  support  application-aware  orchestration,  such  as  [20-22],  only 
do  so  with  respect  to  certain  types  of  applications.  For  instance,  [20]  only  considers 
data  intensive  applications,  [21]  strives  to  minimize  network  traffic  subject  to  physical 
server  resource  constraints,  similar  to  [23],  and  [22]  focuses  primarily  on  computation 
and  memory  intensive  high-performance  computing  applications.  Reference  [24] 
considers  fair-ness  in  multi-tenant  datacenter  environments  by  searching  for 
placements  of  tenant  applications  that  minimize  the  sum  of  network  diameter  of  all 
tenants,  but  does  not  consider  different  application  types.  And  while  NETMAP  [25] 
considers  tenant  applications  with  different  requirements,  it  only  considers  network 
traffic  and  fault  tolerance  requirements  of  applications,  and  thus  does  not  speak  to  the 
placement  of  computationally  intensive  or  storage  intensive  applications. 

None  of  these  provides  a  general  solution  for  best  allocating  different  types  of  indi¬ 
vidual  application  workloads  (e.g.,  computation  or  memory  intensive  vs.  data  or 
network  intensive)  within  a  shared  multi-tenant  data  center  to  jointly  achieve  tenant 
(cloud  customer)  and  operator  (cloud  provider)  objectives. 
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1.3.3  Scalable  Orchestration 


Maintaining  fast  speeds  of  execution  (e.g.,  order  of  seconds)  for  online  orchestration 
algorithms,  i.e.  algorithms  that  allocate  data  center  resources  to  new  tenant  appli¬ 
cation  workload  requests  as  they  arrive,  is  yet  another  clear  trend  in  the  research 
community,  especially  as  the  size  of  data  center  network  topologies  approaches  large- 
scale  (i.e.,  thousands  of  physical  hosts,  tens  of  thousands  of  VM  slots).  The  reason 
for  this  trend  is  likely  tied  to  the  overwhelming  increase  in  the  number  and  size  of 
application  workloads  hosted  by  cloud  data  centers,  as  well  as  the  increases  in  size 
(i.e.,  number  of  devices)  of  the  physical  topologies  that  make  up  the  networks  of  these 
data  centers. 

However,  despite  this  increasing  need  for  scalable  orchestration  approaches,  recent 
work  has  been  struggling  to  maintain  fast  resource  allocation  times  as  the  size  of 
workloads  and  physical  topologies  approach  large-scale,  especially  when  multiple  ob¬ 
jectives  are  considered.  Specifically,  the  approach  in  [23]  optimally  minimizes  link 
congestion  in  the  order  of  minutes  for  1024  VM  slots,  but  may  be  slowed  to  the  order 
of  hours  when  other  objectives  are  considered  [16].  In  [16]  a  genetic  algorithm  is  used 
to  approximate  a  minimal  solution  to  a  five-component  weighted  cost  function  in  the 
order  of  tens  of  seconds,  but  only  scales  to  a  relatively  small  64-server  topology.  Other 
online  VM  placement  solutions,  such  as  [26]  and  [27],  consider  large  and  distributed 
data  center  scenarios,  respectively,  but  do  not  explicitly  state  execution  times  for 
these  scenarios.  Single-objective  orchestration  approaches  such  as  [3,4]  achieve  exe¬ 
cution  time  in  the  order  of  seconds  for  2000+  host,  40000+  VM  slot  topologies,  but 
are  limiting  in  that  they  only  approximate  a  solution  to  a  single  objective  function, 
namely  bandwidth  usage,  given  application  survivability  constraints.  As  such,  while 
the  approaches  in  [3,  4]  have  proven  scalable,  they  are  not  generally  extensible  to 
multi-objective  scenarios. 

What’s  needed  is  a  scalable  orchestration  solution  extensible  to  a  multitude  of  dis¬ 
parate  objectives  that  still  achieves  fast  execution  times  (e.g.,  order  of  seconds)  for 
allocating  large  application  workloads  (e.g.,  hundreds  of  VMs)  within  large  data  cen¬ 
ter  networks  (e.g.,  thousands  of  servers). 
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1.4  Key  Design  Principles 


This  dissertation  describes  the  design,  implementation,  and  evaluation  of  new  ap¬ 
proaches  that  address  two  key  challenges:  1)  achieving  visibility  and  consideration 
of  potential  tradeoffs  among  a  multitude  of  NCFs  to  accommodate  a  wide  range  of 
operator  objectives  and  network  conditions,  and  2)  achieving  efficient  allocations  of 
network  resources  to  application  workloads  in  a  manner  that  jointly  considers  the 
objectives  of  the  operator,  and  the  preferences  of  each  individual  application.  There 
are  two  key  design  principles  that  guide  the  design  of  these  new  approaches: 

Comprehensive  Tradeoff  Exploration  Principle.  A  data  center  orchestration 
program  at  minimum  must  solicit  and  rank  proposals  from  all  NCFs.  If  an  operator 
knows  a  priori  how  to  jointly  model  multiple  objectives  with  a  single  ranking  metric, 
the  orchestrator  may  optimize  the  allocation  based  on  the  metric  in  order  to  find  a 
“best  compromise”  solution  for  all  the  objectives.  However,  this  approach  places  a 
heavy  burden  on  the  operator  to  create  the  right  ranking  model  for  his/her  network. 
More  importantly,  it  is  an  open  question  whether  a  search  based  on  such  joint  models 
can  cover  the  potentially  vast  tradeoff  space  between  multiple  objectives.  Table  1.1 
illustrates  the  shortcomings  of  current  orchestrators  with  respect  to  three  desirable 
design  considerations.  Note  that  none  of  the  current  solutions  offer  a  diverse  set  of 
tradeoffs  among  separate  NCFs. 

Thus,  we  argue  that  formulating  the  orchestration  problem  as  a  MOP  that  explicitly 
represents  each  distinct  operator  objective,  is  a  more  effective  way  to  discover  globally 
optimal  compromises  when  all  NCFs  and  operator  objectives  are  considered.  By 
maintaining  a  wide  range  of  solution  candidates  and  applying  the  concepts  of  natural 
evolution,  i.e.,  performing  mutations  and  recombinations  of  high-fitness  candidates, 
a  diverse  set  of  nondominated  proposals  may  be  generated  and  presented  to  the 
operator  for  consideration.  This  is  better  than  proposing  a  single  “best”  allocation, 
since  operator  priorities  are  likely  to  be  fluid  to  accommodate  rapidly  changing  tenant 
demands  and  network  conditions. 

Logical  Application  Workload  (LAW)  Principle.  Because  different  types  of 
applications  contend  for  different  types  of  resources,  for  a  given  application  workload, 
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[Sun  et  al.  2014] 
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[Volpano  et  al.  2014] 

o 

o 

• 

Athens 

[AuYoung  et  al.  2014] 

o 

€ 

€ 

Table  1.1:  Current  network  orchestration  solutions  in  the  context  of  three  design  con¬ 
siderations,  including  the  key  design  principle  of  comprehensive  tradeoff  exploration, 
which  none  of  these  solutions  offer. 

some  feasible  resource  allocations  for  it  may  be  better  than  others.  For  instance,  two 
distinct  allocations  for  a  given  workload  may  use  the  same  number  of  VM  slots,  but 
one  allocation  may  be  better  than  the  other  if  the  relative  positioning  of  its  VMs 
achieves  strictly  less  resource  contention  while  maintaining  the  same  values  of  global 
operator  objectives  with  respect  to  the  other. 

Moreover,  we  recognize  that  by  independently  simulating  the  desired  placements  of  in¬ 
dividual  applications  in  a  contention-free  environment,  e.g.,  an  empty  physical  topol¬ 
ogy,  we  can  precompute  the  ideal  allocations  of  an  application’s  VMs  for  a  wide  range 
of  workload  profiles,  and  then  subsequently  treat  the  collection  of  individual  “per  VM” 
allocations  for  a  workload  as  single  atomic  unit.  We  term  each  of  these  atomic  units 
a  logical  application  workload  (LAW).  LAWs  are  important  because  current  per  VM 
approaches  for  placing  workloads  into  data  centers  consisting  of  thousands  of  servers 
does  not  scale  very  well.  Specifically,  execution  times  for  such  resource  allocation 
approaches  range  from  the  order  of  hours  at  worst  to  the  order  of  tens  of  seconds  at 
best,  and  these  times  only  get  worse  as  the  sizes  of  application  workloads  and  phys¬ 
ical  topologies  increase,  which,  according  to  current  trends  in  data  center  growth,  is 
already  happening  [2], 
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By  allocating  LAWs,  which  typically  consist  of  tens  or  even  hundreds  of  VMs,  as  op¬ 
posed  to  individual  VMs,  algorithms  for  efficiently  allocating  large  data  center  work¬ 
loads  become  scalable,  permitting  sub-second  placement  times.  And  since  LAWs  are 
explicitly  constructed  to  minimize  contention  from  the  perspective  of  the  VMs  belong¬ 
ing  to  the  application,  resultant  allocations  comprised  of  successive  LAW  placements 
come  with  some  guarantees  of  application  performance,  which  may  assist  operators 
in  ensuring  that  they  meet  required  SLAs. 


1.5  A  Scalable  Approach  To  Comprehensive  and 
Application- Aware  Data  Center  Orchestration 

Following  the  key  design  principles,  we  propose  a  scalable  approach  to  comprehensive 
and  application-aware  data  center  orchestration  consisting  of  two  conceptually  dis¬ 
tinct  but  thematically  coherent  algorithms:  1)  an  Evolutionary  Algorithm  for  SDN 
Orchestration  (EASO)  that  comprehensively  explores  the  tradeoff  space  among  mul¬ 
tiple  objectives,  and  2)  a  scalable  algorithm  for  rapidly  constructing  and  allocating 
Logical  Application  Workloads  (LAWs)  to  jointly  achieve  operator  objectives  and  ap¬ 
plication  preferences.  We  have  built  prototypes  for  both  algorithms  and  evaluated 
them  with  realistic  workload  traces  in  simulated  large-scale  data  center  environments. 

EASO  [28]:  Orchestrating  Network  Control  Functions  via  Comprehensive 
Trade-off  Exploration.  In  this  contribution,  we  formulate  SDN  orchestration  as  a 
multiobjective  optimization  problem,  and  present  an  evolutionary  approach  designed 
to  explore  the  tradeoff  space  between  multiple  NCFs  comprehensively  and  avoid  local 
optima.  For  an  instance  of  the  VM  allocation  problem  subject  to  three  independent 
NCFs  optimizing  network  survivability,  bandwidth  efficiency,  and  power  consumption, 
respectively,  we  demonstrate  that  our  approach  can  enumerate  a  wider  range  of,  and 
potentially  better,  solutions  than  current  orchestrators,  for  data  centers  with  hundreds 
of  switches,  thousands  of  servers,  and  tens  of  thousands  of  VM  slots. 

LAW  [29]:  An  Application- Aware  Approach  to  Scalable  Online  Placement 
of  Data  Center  Workloads.  Currently,  operators  allocate  workload  VMs  primar¬ 
ily  in  an  application-agnostic  fashion,  focusing  on  minimizing  total  resource  usage. 


16 


In  this  contribution,  we  first  show  that  such  allocations  can  be  suboptimal  when 
intra-  and  inter-application  resource  contention  is  present,  and  then  present  a  new 
application-aware  approach  that  explicitly  models  the  resource  preferences  of  indi¬ 
vidual  workloads.  Further,  we  propose  a  new  logical  application  workload  (LAW) 
abstraction  to  enable  precomputation  of  the  required  relative  positioning  of  an  appli¬ 
cation’s  VMs  and  allocation  of  these  VMs  in  a  single  atomic  step,  leading  to  online 
algorithms  that  are  one  order  of  magnitude  faster  than  existing  per  VM  placement 
solutions.  We  then  develop  a  statistical  extension  of  LAW  to  add  flexibility  in  char¬ 
acterizing  application  requirements  and  to  support  prioritization  of  workloads.  Using 
realistic  workload  traces  and  physical  topologies,  we  evaluate  our  algorithms  in  a  sim¬ 
ulated  large-scale  data  center  setting,  and  demonstrate  their  performance  advantages 
and  potential  tradeoffs  versus  existing  solutions. 

The  remainder  of  this  dissertation  is  organized  as  follows:  Chapter  2  presents  EASO, 
implemented  via  an  evolutionary  algorithm  for  comprehensively  exploring  tradeoffs 
between  multiple  NCFs.  Chapter  3  represents  the  LAW  principle,  implemented  via 
scalable  LAW  construction  and  allocation  algorithms  to  achieve  specified  operator  ob¬ 
jectives  while  minimizing  resource  contentions  among  individual  applications.  Chap¬ 
ter  4  discusses  contributions  of  each  separate  principle  when  used  independently,  and 
also  describes  how  the  two  principles  may  be  combined  to  achieve  the  benefits  of 
both.  We  then  present  ideas  for  future  work  and  provide  conclusions. 
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CHAPTER  2: 

Orchestrating  Network  Control  Functions  via 
Comprehensive  Trade-off  Exploration 


This  chapter  focuses  on  the  goal  of  providing  several  workable,  alternative  solutions 
to  the  orchestration  problem  to  a  decision  maker.  We  argue  that  this  goal  is  best 
achieved  by  comprehensively  exploring  the  tradeoff  space  among  multiple  network 
control  functions  (NCFs),  each  designed  to  achieve  a  specific  operator  objective.  Sim¬ 
ply  striving  to  minimize  a  global  cost  function  representative  of  cumulative  resource 
usage  is  not  sufficient,  as  disparate  objectives,  such  as  fault  tolerance  and  security 
enforcement,  exist  and  are  important  to  tenants  and  data  center  operators  alike.  In¬ 
stead,  we  choose  to  formulate  the  orchestration  problem  as  a  pure  multi-objective 
problem  that  explicitly  represents  each  operator  objective  individually,  and  subse¬ 
quently  propose  EASO,  an  evolutionary  approach  that  provides  a  set  of  desirable 
tradeoffs  to  the  data  center  operator,  given  a  set  of  tenant  application  workload  re¬ 
quirements. 

For  an  instance  of  the  orchestration  problem  subject  to  three  independent  NCFs 
attempting  to  optimize  network  survivability,  bandwidth  efficiency,  and  power  con¬ 
servation,  respectively,  we  demonstrate  that  our  approach  can  enumerate  a  wider 
range  of,  and  potentially  better,  solutions  than  current  orchestrators,  for  data  centers 
with  hundreds  of  switches,  thousands  of  servers,  and  tens  of  thousands  of  VM  slots. 


2.1  Introduction 

In  this  contribution,  we  defer  the  detection  and  resolution  of  NCF  conflicts  to  other 
work,  such  as  [9,15],  and  choose  to  focus  on  exploring  the  utility  of  various  feasible 
NCF  proposals  within  the  tradeoff  space  with  respect  to  competing  NCFs.  We  define 
this  notion  of  utility  with  respect  to  the  NCFs  themselves,  as  done  in  [10,11],  under 
the  assumption  that  each  independent  NCF  has  a  corresponding  utility  function  that 
it  seeks  to  maximize. 
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Using  this  notion  of  NCF  utility,  we  define  the  NCF  orchestration  problem  as  a  multi¬ 
objective  optimization  problem  (MOP)  where  each  NCF  utility  function  comprises  a 
component  of  the  optimization  problem.  Prior  work  [3,4,10,11,16],  overwhelmingly 
attempts  to  reduce  the  multi-objective  nature  of  orchestration  to  a  single-objective 
problem  (SOP),  either  by  casting  multiple  objectives  in  terms  of  a  single  global  utility 
function  [10,11,16],  or  by  optimizing  one  objective  function  subject  to  the  others  cast 
as  constraints  [3,4], 

Although  SOP  formulations  of  the  orchestration  problem  permit  faster  solutions, 
solving  a  SOP  yields  only  a  single  solution  within  a  potentially  vast  tradeoff  space. 
Furthermore,  many  current  approaches  use  search  algorithms  based  on  greedy  heuris¬ 
tics  [4, 10, 11],  which  may  prematurely  converge  to  suboptimal  local  maxima  when 
applied  to  non-convex  optimization  problems.  Thus,  we  believe  it  prudent  to  explore 
an  alternative,  more  basic  formulation  based  on  the  classical  multi-objective  opti¬ 
mization  problem  (MOP)  literature  [17-19],  where  our  goal  is  to  enumerate  a  diverse 
set  of  efficient  solutions  among  competing  NCFs,  i.e.  each  solution  in  the  set  can¬ 
not  be  improved  in  any  objective  without  causing  degradation  in  at  least  one  other 
objective. 

In  the  following  sections,  our  contribution  is  three- fold.  First,  we  present  a  novel 
MOP  problem  formulation  for  SDN  orchestration.  Second,  we  describe  an  evolution¬ 
ary  approach  for  enumerating  a  wide  range  of  efficient  NCF  proposals,  scalable  to 
topologies  of  thousands  of  hosts  and  hundreds  of  switches.  Third,  we  present  new 
metrics  and  use  them  to  evaluate  our  approach  vs.  current  solutions  in  the  context 
of  the  VM  allocation  problem.1 

2.2  Rethinking  Orchestration 

Orchestrating  multiple  NCFs  to  achieve  and  maintain  stable  and  desirable  network 
operating  conditions  face  major  technical  challenges.  To  illustrate  them,  consider  the 
following  simplistic  scenario  where  a  data  center  must  allocate  virtual  machines  (VMs) 
that  can  be  supported  by  its  physical  infrastructure  to  a  set  of  tenant  applications. 

1  Although  we  focus  on  VM  allocation  NCFs  in  this  work,  we  are  confident  that  our  approach  is 
generally  applicable  for  orchestrating  NCFs  of  any  virtual  network  function. 
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rrm  cmd  urn  "ffr 


(a)  NCF-S 


Tenant  ToR  WCS  (avg):  0.0 
Mean  Link  BW:  68.2  Mbps 

Power  Used:  23  units 


mmws 


Tenant  ToR  WCS  (avg):  0.13 
Mean  Link  BW:  86.4  Mbps 

Power  Used:  17.5  units 


(b)  NCF-B  (c)  NCF-P 

Figure  2.1:  Example  proposals  from  NCF-S,  NCF-B,  and  NCF-P.  Green  slots  are 
for  VMs  of  Rl,  red  R2,  and  gold  R3.  We  assume  that  each  server,  ToR  switch, 
aggregation  switch,  or  core  switch  uses  1,  2,  2.5,  and  3  units  of  power,  respectively. 


We  have  chosen  to  study  VM  allocation  because  1)  it  is  one  of  the  important  Erst 
steps  of  any  data  center  operation,  and  2)  the  problem  has  an  extensive  collection  of 
prior  work  for  us  to  compare  to  our  contribution. 

In  order  to  specify  requirements  for  instances  of  the  SDN  orchestration  problem, 
we  use  <VM,  BW>  pairs,  where  the  Erst  value  in  the  tuple  represents  the  number 
of  VMs  required,  and  the  second  value  represents  the  inter- VM  bandwidth  (BW) 
requirement.  Suppose  three  independent  tenant  applications  Rl,  R2,  and  R3  have 
requirements  <5,  50  Mbps>,  <5,  100  Mbps>,  and  <5,  150  Mbps>  respectively. 
Suppose  the  underlying  physical  infrastructure  has  a  simple  tree  topology,  consisting 
of  one  core  switch  as  root,  two  aggregation  switches  and  four  top  of  rack  (ToR) 
switches,  four  host  servers  per  ToR  switch,  and  two  VM  slots  per  host  server,  where 
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a  “VM  slot”  is  defined  as  a  standard  physical  resource  unit  (e.g.,  CPU,  Memory) 
provisioned  to  a  VM.  Therefore,  the  VMs  are  to  be  allocated  against  4  x  4  x  2  =  32 
possible  slots. 

Suppose  the  data  center  utilizes  three  different  NCFs  simultaneously:  NCF-S,  NCF-B 
and  NCF-P.  NCF-S  is  designed  to  maximize  the  applications’  worst  case  survivabil¬ 
ity  (WCS)  over  failures  of  ToR  switches  and  physical  servers  with  a  preference  for 
spreading  VMs  of  each  application  across  separate  racks  and  servers  (Figure  2.1a). 
Specifically,  an  application’s  ToR  WCS  is  defined  as  the  fraction  of  its  VMs  that  sur¬ 
vive  a  single  worst  case  ToR  switch  failure  [5].  In  this  scenario,  we  use  the  mean  ToR 
WCS  (across  all  tenants)  as  a  representative  metric  for  NCF-S.  NCF-B  is  designed  to 
minimize  the  mean  link  BW  reservation,  allocated  using  the  hose  model  [30].  Using 
the  hose  model,  each  network  link  utilized  by  a  tenant  application  divides  the  tenant 
application  subtree  into  two  components,  and  the  bandwidth  needed  on  this  link  for 
the  tenant  application  is  determined  by  multiplying  the  per-VM  bandwidth  required 
by  the  application  and  the  number  of  VMs  on  the  smaller  of  the  two  components, 
as  done  in  [30].  Thus,  NCF-B  has  a  preference  for  consolidating  VMs  of  the  same 
application  as  close  to  one  another  as  possible,  e.g.,  placing  on  same  server  or  rack 
(Figure  2.1b).  NCF-P  aims  to  minimize  total  power  consumption  by  placing  VMs  on 
the  fewest  number  of  racks  and  servers,  thus  allowing  unused  resources  to  be  pow¬ 
ered  down  (Figure  2.1c).  One  candidate  proposal  of  allocation  can  dominate,  i.e.„ 
be  strictly  better  than,  another  if  it  achieves  better  performance  for  at  least  one  ob¬ 
jective  and  no  worse  performance  for  each  other  objective.  However,  sometimes,  two 
candidate  proposals  cannot  be  simply  ranked  against  each  other  as  each  is  better  for 
a  different  objective;  in  this  case,  we  say  they  are  nondominated  with  respect  to  each 
other,  which  is  the  case  for  any  pair  of  proposals  in  Figure  2.1. 

An  orchestrator  at  minimum  must  solicit  and  rank  candidate  proposals  from  all  NCFs. 
Clearly,  the  network  cannot  execute  all  three  proposals  in  Figure  2.1.  Nor  should  it 
be  asked  to  somehow  accommodate  each  by  time-slicing  them.  If  an  operator  knows 
a  priori  how  to  jointly  model  the  three  objectives  with  a  single  ranking  metric,  the 
orchestrator  may  optimize  the  allocation  based  on  the  metric  in  order  to  find  a  “best 
compromise”  solution  for  all  the  objectives.  However,  this  approach  places  a  heavy 


22 


burden  on  the  operator  to  create  the  right  ranking  model  for  his/her  network.  More 
importantly,  it  is  an  open  question  whether  a  search  based  on  such  joint  models  can 
cover  the  potentially  vast  tradeoff  space  between  the  objectives.  Compounding  the 
problem  is  that  the  NCFs  are  supposed  to  come  from  third-party  vendors,  and  thus 
are  likely  to  be  black-box  solutions  to  the  operator.  The  operator  could  conceivably 
collaborate  with  the  NCF  vendors  to  create  “grey-box,”  or  even  “white-box”  solutions 
where  he/she  has  access  to  the  internal  logic  of  the  NCFs.  While  doing  so  may  reduce 
the  search  space  for  finding  an  acceptable  compromise,  it  remains  a  challenge  for  an 
orchestrator  to  adequately  explore  the  multi-NCF  tradeoff  space  for  large  network 
configurations. 


Tenant  ToR  WCS  (avg): 
Mean  Link  BW: 

Power  Used: 


0.20 

86.4  Mbps 

17.5  units 


Tenant  ToR  WCS  (avg):  0.60 
Mean  Link  BW:  168.2  Mbps 

Power  Used:  22  units 


(a)  Mutated  Power  Cons. 


(b)  Recombined  BW /WCS 


Figure  2.2:  Mutated  power  conservation  proposal  and  recombined  BW  conservation 
and  survivability  proposal. 


In  prior  work,  e.g.,  Athens  [11]  and  Corybantic  [10],  all  candidate  solutions  are  exclu¬ 
sively  generated  by  the  NCFs.  Assuming  that  each  NCF  generates  one  proposal,  the 
initial  population  in  the  context  of  this  example  is  limited  to  the  three  proposals  de¬ 
picted  in  Figure  2.1.  It  may  be  possible  for  each  NCF  to  generate  multiple  proposals, 
but  even  so,  it  seems  unrealistic  to  expect  a  specialized  NCF  to  generate  mutually 
beneficial  compromises  without  knowledge  or  understanding  of  the  performance  crite¬ 
ria  of  the  others.  Corybantic  and  Athens  then  proceed  by  selecting  the  NCF  proposal 
that  maximizes  some  global  utility  function  (e.g.,  votes,  cost-effectiveness),  and  may 
iteratively  solicit  the  NCFs  for  incremental  counter-proposals  to  the  previously  se¬ 
lected  allocation  until  the  greedy  criterion  cannot  further  be  improved  by  any  NCF 
proposal. 
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The  problem  here  is  that  even  if  a  specialized  NCF  is  able  to  make  effective  counter¬ 
proposals,  which  may  itself  be  challenging,  the  region  of  the  search  space  enumerated 
by  such  proposals  is  limited  to  what  is  reachable  from  the  previously  selected  proposal. 
For  instance,  if  the  power  conservation  proposal  depicted  in  Figure  2.1c  is  selected 
initially,  then  it  may  be  the  case  that  while  the  proposal  in  Figure  2.2a  is  reachable 
via  a  series  of  incremental  counter-proposals,  the  proposal  in  Figure  2.2b  may  not  be. 
As  a  result,  the  final  proposal  obtained  may  not  be  globally  optimal,  i.e. ,  it  may  be 
dominated  by  another  feasible,  but  unexplored  allocation. 

In  contrast,  we  argue  that  mutation  and  recombination  of  proposals  in  an  NCF- 
agnostic  manner,  in  addition  to  NCF-specific  heuristic  mutations,  and  subsequent 
evaluation  as  a  MOP  comprised  of  distinct  NCF  performance  criteria,  is  a  more  ef¬ 
fective  way  to  discover  “globally  optimal”  compromises.  A  mutation  is  similar  to  a 
counter-proposal  in  [10,11]  in  that  it  is  a  small-scale  change  to  some  previous  alloca¬ 
tion  in  an  effort  to  guide  the  search  towards  a  local  optimum,  whereas  a  recombination 
is  large-scale  change  produced  by  combining  desirable  elements  of  two  different  states 
with  the  aim  of  exploring  a  new  frontier  of  the  state  space  to  subsequently  discover 
new  local,  and  possibly  global,  optima.  Furthermore,  by  maintaining  a  wide  range  of 
solution  candidates  and  applying  the  concepts  of  natural  evolution,  i.e.,  performing 
mutations  and  recombinations  of  high-fitness  candidates,  a  diverse  set  of  nondomi- 
nated  allocation  alternatives  may  be  generated  and  presented  to  the  network  operator 
for  consideration.  If  an  operator  requirements  are  fluid  then  this  approach  offers  mul¬ 
tiple  prospective  proposals  for  consideration  as  opposed  to  a  single  “best”  one,  which 
may  be  undesirable. 

For  instance,  consider  a  potential  mutation  of  the  power  conservation  proposal,  and 
a  potential  recombination  of  the  survivability  and  BW  conservation  proposals,  pre¬ 
sented  in  Figure  2.2.  Note  that  while  both  the  mutated  power  conservation  proposal 
(Figure  2.2a)  and  the  recombined  survivability  and  BW  conservation  proposal  (Figure 
2.2b)  each  dominate  a  predecessor  state  (Figures  2.1c  and  2.1b,  respectively),  these 
proposals  are  nondominated  with  respect  to  one  another.  Although  the  proposal  in 
Figure  2.2b  offers  better  ToR  WCS,  it  uses  more  power  and  BW  than  the  proposal  in 
Figure  2.2a.  Since  neither  proposal  dominates  the  other,  both  should  be  maintained 
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Tenant  ToR  WCS  (avg):  0.40 
Mean  Link  BW:  122.7  Mbps 

Power  Used:  17.5  units 


Tenant  ToR  WCS  (avg):  0.60 
Mean  Link  BW:  163.6  Mbps 

Power  Used:  22  units 
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(a)  Mutated  from  Fig.  2.2a  (b)  Mutated  from  Fig.  2.2b 

Figure  2.3:  Two  compromises  from  successive  rounds  of  recombination  and  imitation 
of  nondominated  candidates. 


as  desirable  solution  candidates. 

This  is  especially  critical  when  the  operator  requirements  are  complex.  The  pro¬ 
posal  in  Figure  2.2b  is  more  appropriate  for  mission  critical  applications  that  require 
maximum  fault  tolerance,  while  the  proposal  in  Figure  2.2a  is  better  suited  towards 
general  applications  that  do  not  require  such  a  high  level  of  availability.  Depending 
on  the  criticality  of  the  tenant  applications  Rl,  R2,  and  R3,  the  operator  can  easily 
implement  either  proposal.  In  contrast,  [10, 11]  would  discard  one  of  these  desirable 
proposals,  leaving  the  operator  with  only  one  proposal  without  presenting  further 
desirable  evolutions  (as  shown  in  Figure  2.3)  to  the  operator. 

2.3  Multi-Objective  Optimization  Formulation 

In  this  section,  we  formally  model  VM  allocation  involving  M  tenant  applications, 
N  NCFs,  and  physical  topology  P,  as  a  MOP.  Let  A"  represent  the  set  of  all  possible 
allocations;  each  instance  x  G  X  captures  a  proposed  allocation  (e.g.,  for  the  Section 
II  scenario,  x  is  a  32-elenrent  vector  describing  which  application,  if  any,  occupies  each 
of  the  VM  slots  of  P ).  Let  Xf  C  X,  termed  feasible  allocation  set,  denote  the  subset 
of  all  allocations  that  can  meet  the  requirements  of  all  applications  (e.g.„  quantity  of 
VMs,  intra-VM  network  bandwidth,  etc). 

Let  fi()  denote  the  utility  function  used  to  evaluate  the  goodness  of  an  allocation 
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with  respect  to  NCF  i,  i  =  1  (e.g.,  ToR  WCS  for  NCF-S,  mean  link  BW  for 

NCF-B,  and  total  power  usage  for  NCF-P).  We  believe  this  function  will  most  likely 
be  defined  by  the  data  center  operator  to  account  for  local  conditions,  possibly  with 
input  form  the  NCF  vendor.  Therefore,  for  a  proposed  allocation  x  G  Xf,  the  objective 
vector  y  —  (/i(x),/2(x),...,/jv(x))  precisely  captures  its  operational  merit  from  the 
perspectives  of  all  objectives. 

Instead  of  computing  a  weighted  sum  or  using  other  techniques  to  reduce  the  objective 
vector  to  a  single  scalar  metric  and  then  hireling  one  “best”  allocation  with  respect 
to  that  metric,  as  done  in  prior  work,  we  leverage  the  classical  MOP  literature  [17] 
to  look  for  a  set  of  solutions  that  illuminates  the  entire  trade-off  space  of  the  diverse 
objectives.  First,  we  formally  define  when  two  allocations  can  be  ordered  in  the  77- 
dimension  objective  vector  space,  i.e.,  when  one  dominates  the  other,  and  when  they 
cannot  because  they  may  be  preferred  for  different  objectives. 

Def.  1.  (Pareto  Dominance  [19])  For  any  two  allocations  x\,X2  G  Xf, 

(i)  x,[  >-  x2  {x\  dominates  x2)  iff  i)  >  ffx  2) 

and  Vj  i,fj{x i)  >  fj(x 2). 

(ii)  x\  ~  X2  (xi,  X2  are  nondominated  w.r.t.  each  other) 

iff3i,j.i^j  and  ft(x{)  >  /*(x2)  and  fj(x2)  >  fj(x i). 


This  concept  of  Pareto  dominance  allows  us  to  define  the  optimality  criterion  for  the 
MOP  formulation.  If  some  allocation,  x ,  is  not  dominated  by  any  other  allocation, 
then  this  means  that  x  is  optimal  in  the  sense  that  it  cannot  be  improved  in  any 
objective  without  causing  degradation  in  at  least  one  other  objective.  Such  solutions 
are  referred  to  as  Pareto-optimal  [19]. 

Def.  2.  (Pareto  Optimality  [19])  An  allocation  x  is  Pareto -optimal  regarding  the 
feasible  allocation  set  Xf  iff  x  G  Xf  and  for  no  other  proposal  x'  G  Xf  is  x'  >-  x. 

The  entirety  of  all  Pareto-optimal  solutions  is  called  the  Pareto-optimal  set ,  denoted 
by  Xp]  the  corresponding  objective  vectors  form  the  Pareto- optimal  front  or  surface, 
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denoted  by  Yp.  Now  a  MOP  formulation  of  the  VM  allocation  problem  is  simply  as 
follows. 

Def.  3.  (MOP  for  SDN  Orchestration) 

maximize  y  =  /^(x), ...,  /jy(x)),  i.e.,  enumerate  all  Pareto-optimal  so¬ 

lutions, 

subject  to  x  G  Xj  and  additional  criteria  such  as  lower  bounds  for  fi{x)  ’s. 

2.4  Evolutionary  Approach 

Prior  MOP  work  [18,31,32]  shows  that  an  evolutionary  approach  (i.e.,  genetic  al¬ 
gorithm),  which  keeps  track  of  potential  nondominated  solutions  and  evolves  (i.e., 
expands  and  improves)  them  via  mutation  and  recombination,  can  ensure  1)  subopti- 
mal  local  maxima  tend  to  be  avoided,  and  2)  a  wider  range  of  solution  candidates  will 
be  considered  vs.  a  greedy  approach.  In  this  section,  we  present  such  an  evolutionary 
algorithm,  termed  Evolutionary  Algorithm  for  SDN  Orchestration  (EASO),  to  solve 
the  MOP  problem  formulated  in  the  previous  section. 

2.4.1  EASO  Specification 

A  specification  of  the  EASO  algorithm  is  provided  below.  In  addition  to  Pareto- 
dominance,  fitness  assignment  in  Step  2  is  also  based  on  crowding  distance,  which 
measures  the  uniqueness  of  a  candidate  solution  with  respect  to  other  members  in 
the  set,  as  done  in  [31,32], 

Algorithm  2.1.  EASO 

Input:  K:  number  of  generations;  L:  external  set  size; 

P:  physical  topology  tree;  s:  mutation  size 

Output:  solution  set  Xs 

Step  1 :  Initialization: 

a)  Set  initial  population  Ao  =  0,  k  =  0 

b)  Set  initial  external  set  ESq  =  0 
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c)  Solicit  each  of  the  N  NCFs  for  its  proposed  allocation 

x  e  Xf  and  add  x  to  Aq 

d)  Add  ( L  —  N )  randomly  generated  allocations  to  Aq 

Step  2:  Fitness  Assignment  /  Termination: 

a)  Calculate  fitness  F  of  allocations  in  Ak  U  ESk 

b)  Derive  Jiondominated  feasible  set  Xs  from  AkUESk 

c)  If  k>  K  then  return  Xs 

Step  3:  Update  of  external  set: 

a)  If  |ATS|  >  L  then 

Remove  (|X,S|  —  L)  worst  fitness  allocations  from  Xs 

b)  Else  if  |  Ag  [  <  L  then 

Add  (L—  |  Xs  | )  best  fitness  allocations  of  A^  to  Xs 

c)  Set  ESk+1  =  Xs 

Step  4:  Recombination: 

a)  Set  mating  pool  MP  =  Xs,  child  pool  CP  =  0 

b )  For  each  NCF  i  do 

1 )  Sort  MP  in  ascending  order  of  fi 

2)  Fora  in  1  to  |_|MP|/2J  do 

b=  (\MP\  +  l)-a 

x=  RECOMBINE(MP[a],MP[6]) 

Add  x  to  CP 

Step  5:  Mutation: 

a)  For  each  NCF  i  do 
1 )  For  each  allocation  x  G  MP  do 

u  =  MUTATE(a:,  s,i,goal  =  “  Improve") 
w  =  MUTATE(a:,s,i,  goal  =  “Degrade”) 

Add  u,w  to  CP 

Step  6:  Update  Population: 

a)  Set  Ak_ |_i  =  CP,  k  =  k  +  1;  Goto  Step  2 


2.4.2  Evolutionary  Primitives 

In  EASO,  the  MUTATE  primitive  procedure  takes  an  NCF  i  as  an  input  parameter, 
and  uses  an  NCF-specific  heuristic  to  attempt  to  relocate  up  to  s  VMs  (mutation  size) 
in  order  to  improve  (or  degrade)  the  value  of  /) .  Although  not  strictly  necessary, 
the  degrade  step  is  included  in  order  to  increase  entropy  and  help  to  maximize  the 
diversity  of  the  candidate  solution  set.  Because  a  tradeoff  space  is  assumed,  by 
intentionally  degrading  the  utility  of  one  NCF,  another  may  benefit.  For  the  example 
scenario  described  in  Section  2.2,  the  following  NCF-specific  mutation  heuristics  are 
used  within  the  MUTATE  procedure.  Here,  we  use  the  term  affinity  to  refer  to  the 
number  of  VMs  of  a  particular  application  residing  in  the  same  subtree. 

•  /i  (ToR  WCS)  :  1)  Identify  the  application  m  with  the  lowest  value  of  ToR 
WCS.  2)  Relocate  up  to  s  VMs  of  m  from  the  highest  affinity  subtree  of  the 
physical  topology  to  some  number  of  lower  affinity  subtrees. 

•  f‘2  (Bandwidth  Conservation)  :  1)  Identify  the  application  m  with  the  highest 
BW  usage.  2)  Relocate  up  to  s  VMs  of  m  from  the  lowest  affinity  subtree  to 
higher  affinity  subtrees. 

•  /3  (Power  Conservation)  :  1)  Identify  the  application  m  using  the  highest  num¬ 
ber  of  racks  (and  servers  in  the  case  of  a  tie).  2)  Remove  up  to  s  VMs  of  m 
from  the  lowest  affinity  subtree  and  replace  them  using  a  “first-fit”  bin  packing 
heuristic. 

In  contrast  to  MUTATE,  the  RECOMBINE  primitive  procedure  is  NCF-agnostic, 
and  simply  performs  a  merging  of  two  input  allocations  by  randomly  selecting  VM 
placements  from  each  to  form  a  new  output  allocation.  To  help  encourage  diversity 
during  the  recombination  step,  the  mating  pool  MP  is  sorted  in  each  dimension  fi, 
and  for  each  sorting,  each  candidate  solution  is  recombined  with  its  counterpart  at 
the  opposite  end  of  the  fi  spectrum,  i.e.,  first  vs.  last,  second  vs.  second-to-last,  etc. 

2.4.3  Complexity  Analysis 

Ideally,  the  solution  set  Xs  returned  by  EASO  is  equal  to  the  Pareto-optimal  set 
(denoted  by  Xp).  However,  the  size  of  the  feasible  allocation  set  Xf,  and  hence  the 
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time  required  to  totally  enumerate  Xp,  grows  combinatorially  with  the  number  of 
switches  and  servers  in  the  physical  topology  tree  (denoted  by  \P\).  For  nontrivial 
values  of  |P|,  e.g.,  the  large  topology  used  in  Section  2.5.3,  totally  enumerating  Xp 
may  require  prohibitively  large  EASO  input  parameters.  In  these  cases,  Xs  is  rather 
an  inner  approximation  [18]  of  Xp,  i.e.,  elements  in  Xs  may  be  dominated  by  those 
in  Xp. 

Space  Complexity:  The  maximum  population  size  contains  \CP\  =  N  *  (L/2  +  2L)  = 
states,  and  each  state  contains  \P\  elements,  hence  yielding  a  space  complexity 
of  0(N-L-\P\). 

Time  Complexity:  Each  of  the  proposals  are  evaluated  by  N  utility  functions, 
and  each  utility  function  evaluates  at  most  |P|  elements  of  each  proposal,  for  a  re¬ 
sultant  complexity  of  0(N2  ■  L-  |P|).  Fitness  assignment  requires  at  most  ()(N  ■  Li2) 
comparisons  using  the  scheme  presented  in  [32],  RECOMBINE  is  called  ^  times, 
and  MUTATE  is  called  2 NL  times.  Each  call  to  MUTATE  performs  at  most  s 
VM  reallocations,  and  the  main  algorithm  loop  runs  K  times.  Hence,  the  total  time 
complexity  of  this  algorithm  is  0(N 2  ■  L2  ■  \P\  ■  K  ■  s). 

2.5  Evaluation 

In  this  section,  we  evaluate  EASO  using  two  topologies.  First,  the  simplistic  scenario 
described  in  Section  II  is  used  to  illustrate  that  EASO  can  produce  more  diverse, 
and  potentially  better  solutions  than  current  orchestators.  Then,  a  relatively  large 
topology  [4]  is  used  to  illustrate  that  EASO  scales  to  large  data  centers. 

2.5.1  Performance  Metrics 

As  discussed  in  Section  2.4.3,  for  nontrivial  topologies,  EASO  may  only  (inner)  ap¬ 
proximate  the  Pareto-optimal  set  Xp.  To  evaluate  the  accuracy  of  this  inner  approx¬ 
imation  (denoted  by  Xs),  we  propose  two  metrics:  distance  and  coverage  to  compare 
Xs  against  Xp  using  their  corresponding  sets  of  objective  vectors,  i.e.,  images  in  the 
objective  vector  space.  Specifically,  let  Ys  and  Yp  denote  the  image  sets  of  Xs  and 
Xp,  respectively.  (When  it  is  infeasible  to  obtain  Xp  and  Yp  clue  to  nontrivial  values 
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of  |Tj,  we  use  the  “constraint  method”  well  known  in  MOP  literature  [17]  to  first  con¬ 
struct  an  outer  approximation  of  Yp,  and  then  use  it  in  place  of  Yp  in  equations  (2.1) 
and  (2.2)  to  compute  distance  and  coverage.  See  Section  2.5.3  for  more  details.)  The 
distance  of  an  objective  vector  y  G  Ys  is  defined  as  follows. 

Def.  4.  (Distance) 

distance (y,  Yp)  =  min  dist(y,w),  (2.1) 

weYp 

where  dist(y,w )  represents  the  Euclidean  distance  between  points  y  and  w. 


Once  the  distance  of  each  point  y  eYs  has  been  calculated,  we  calculate  the  mean, 
min,  and  max  distances  of  points  in  Ys  to  provide  a  set  of  distance  measures  repre¬ 
sentative  of  the  solution  set  as  a  whole. 


The  current  orchestrators,  such  as  [10, 11]  strive  only  to  find  a  single  solution  that 
minimizes  distance,  without  regard  for  the  potentially  vast  tradeoff  space.  In  contrast, 
a  novel  aspect  of  our  approach  is  the  enumeration  of  a  set  of  nondominated  tradeoffs. 
Hence,  to  evaluate  the  area  of  the  tradeoff  space  covered  by  a  solution  set  produced 
by  EASO  or  similar  algorithms,  we  propose  the  coverage  metric  (Def.  5),  representing 
the  fraction  of  points  in  the  reference  image  set  Y'p  that  are  “covered,”  i.e.,  nearest 
to  objective  vectors  in  Ys.  Hence,  solution  sets  with  higher  coverage  values  are  more 
desirable. 

Def.  5.  (Coverage) 


coverage{Ys,  Yp) 


|Uyeya  nearest(y,Yp)\ 

\Yp\ 


(2.2) 


where  nearest(y ,  Yp)  =  argmin weYpdist(y,w). 


2.5.2  EASO  vs.  Current  Orchestrators 

In  order  to  illustrate  the  unique  merits  of  EASO  vs.  the  current  orchestrators,  we 
developed  GASO,  a  greedy  version  of  EASO,  to  emulate  methods  proposed  in  [10,11]. 
In  [10,11],  the  authors  do  not  explicitly  state  the  mutation  heuristics  used  by  inde¬ 
pendent  NCFs  to  generate  incremental  counterproposals,  but  rather  defer  this  issue 
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to  future  work,  whereas  the  GASO  mutation  heuristics  (identical  to  EASO,  Section 
2.4.2)  explicitly  specify  how  such  counterproposals  are  suggested.  Additionally,  we 
enhance  GASO  to  enumerate  not  just  one  solution,  but  a  set  of  solutions,  as  described 
later  in  this  section. 

GASO  has  four  notable  differences  vs.  EASO:  1)  the  recombination  step  is  omitted, 
2)  the  Pareto-based  fitness  function  F  is  replaced  with  the  global  objective  function 
Fgiobal(x)  —  +W2(f2(x))  +  ... +WN(fN(x))  where  w  is  an  Af-dimensional 

NCF  weight  vector  and  w%  represents  the  weighting  value  for  NCF  i,  3)  the  external 
set  ES  contains  only  a  single  member  (L  —  1):  the  solution  candidate  with  the  highest 
value  of  F and  4)  the  algorithm  terminates  when  no  NCF-specihc  mutation  of 
the  external  set  member  yields  a  higher  Fgi0bai  value. 

To  compare  GASO  to  EASO,  we  generated  a  comparable  set  of  GASO  solutions  for 
the  Section  2.2  scenario  by  way  of  parametric  analysis  over  a  set  of  fixed  aspira¬ 
tion  levels  (lower  bounds)  for  f\  (WCS),  and  different  weightings  for  /2  (BW)  and 
fs  (power).  For  each  aspiration  level  of  /i,  f\  >  0.00,0.066, ...,0.594;  we  used  two 
different  weightings:  w  =  (1,4,2),  which  clearly  favors  fi  over  fy,  and  w  =  (1,2,4), 
which  conversely  favors  fs  over  /2-  f\  maintains  a  minimum  weighting  here,  as  the 
aspiration  levels  force  an  enumeration  over  the  its  range. 

For  EASO,  we  set  the  size  of  the  external  set  L  =  25,  the  number  of  generations  K  = 
25,  and  the  mutation  size  s  —  5.  EASO  consistently  enumerated  all  14  Pareto-optimal 
solutions2  for  each  of  100  simulation3  runs,  represented  by  the  XfASO  solution  set 
(Table  2.1). 

For  GASO,  we  performed  multiple  runs  via  parametric  analysis,  across  the  range  of 
all  mutation  sizes  (s  =  1,2,. ..,15).  The  resulting  set  of  solutions,  X^ASO ,  represents 
the  best  solutions  produced  by  GASO  throughout  all  270  simulation  runs.  GASO  was 
only  able  to  enumerate  six  of  the  fourteen  distinct  Pareto-optimal  states  (Table  2.1). 

2In  this  simplistic  scenario,  we  were  able  to  enumerate  the  entire  Pareto-optimal  front  Yp  (14 
solutions)  via  brute  force  enumeration,  and  hence  used  Yp  as  a  basis  of  comparison  for  EASO  and 
GASO. 

3The  simulation  consists  of  approximately  2500  lines  of  Java  code,  and  was  run  on  a  Linux  VM 
allocated  8GB  of  RAM  and  2  x  vCPUs.  The  host  PC  (laptop)  was  running  64-bit  Windows  on  an 
Intel  2.4  GHz  quad-core  processor  with  12  GB  of  RAM. 
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XEAS° 

XGASO 

Allocation 

(WCS,  BW,  Power) 

Allocation 

(WCS,  BW,  Power) 

1 

[(5,0,0),  (0,5,0),  (0,0,5),  (0,0,0)] 

(0.000,  68  Mbps,  23  units) 

[(5,0,0),  (0,5,0),  (0,0,5),  (0,0,0)] 

(0.000,  68  Mbps,  23  units) 

2 

[(4,0,0),  (1,5,0),  (0,0,5),  (0,0,0)] 

(0.067,  72  Mbps,  22  units) 

[(3,4,0),  (2,1,5),  (0,0,0),  (0,0,0)] 

(0.200,  86  Mbps,  17.5  units) 

3 

[(3,5,0),  (2,0,5),  (0,0,0),  (0,0,0)] 

(0.133,  77  Mbps,  17.5  units) 

[(3,4,1),  (2,1,4),  (0,0,0),  (0,0,0)] 

(0.267,  100  Mbps,  17.5  units) 

4 

[(2,5,0),  (2,0,0),  (1,0,5),  (0,0,0)] 

(0.200,  84  Mbps,  22  units) 

[(3,3,1),  (2,2,4),  (0,0,0),  (0,0,0)] 

(0.333,  109  Mbps,  17.5  units) 

5 

[(3,4,0),  (2,1,5),  (0,0,0),  (0,0,0)] 

(0.200,  86  Mbps,  17.5  units) 

[(3,3,2),  (2,2,3),  (0,0,0),  (0,0,0)] 

(0.400,  123  Mbps,  17.5  units) 

6 

[(2,4,0),  (2,1,5),  (1,0,0),  (0,0,0)] 

(0.267,  93  Mbps,  22  units) 

[(2,2,3),  (2,2,2),  (0,0,0),  (1,1,0)] 

(0.533,  143  Mbps,  22  units) 

7 

[(3,4,1),  (2,1,4),  (0,0,0),  (0,0,0)] 

(0.267,  100  Mbps,  17.5  units) 

[(2,2,1),  (1,1,2),  (1,1,1),  (1,1,1)] 

(0.600,  191  Mbps,  25  units) 

8 

[(2,3,0),  (2,2,0),  (1,0,5),  (0,0,0)] 

(0.333,  102  Mbps,  22  units) 

9 

[(3,3,1),  (2,2,4),  (1,1,1),  (0,0,0)] 

(0.333,  109  Mbps,  17.5  units) 

10 

[(3,3,2),  (2,2,3),  (0,0,0),  (0,0,0)] 

(0.400,  123  Mbps,  17.5  units) 

11 

[(2,3,1),  (2,2,4),  (1,0,0),  (0,0,0)] 

(0.400,  116  Mbps,  22  units) 

12 

[(3,5,0),  (2,0,5),  (0,0,0),  (0,0,0)] 

(0.467,  130  Mbps,  22  units) 

13 

[(2,2,3),  (2,2,2),  (1,1,0),  (0,0,0)] 

(0.533,  143  Mbps,  22  units) 

14 

[(2,2,2),  (2,2,2),  (1,1,1),  (0,0,0)] 

(0.600,  164  Mbps,  22  units) 

Table  2.1:  X^ASO  and  X^ASO  nondominated  solutions.  The  “Allocation”  column 
represents  the  allocation  of  VMs  to  servers  on  the  four  different  racks,  e.g.,  [(3,5,0), 
(2,0,5),  (0,0,0),  (0,0,0)]  represents  the  assignment  of  3  VMs  of  R1  and  5  VMs  of  R2 
to  Rack  1,  2  VMs  of  R1  and  5  VMs  of  R3  to  Rack  2,  and  none  to  Racks  3  and  4. 


ysEASO 

Ysgaso 

Distance  (Mean,  Min,  Max) 

(0,  0,  0) 

(0.014,  0,  0.097) 

Coverage 

1 

0.5 

Execution  Time  (seconds) 

3.73 

0.45 

Table  2.2:  EASO  vs.  GASO  in  distance  and  coverage  of  their  solution  sets  w.r.t.  Yp, 
and  in  avg.  execution  time. 


Note  that  X^ASO  alloc.  #7,  although  nondominated  with  respect  to  XfASO,  is  not 
Pareto-optimal,  as  it  is  dominated  by  XfASO  alloc.  #14.  Moreover,  X^ASO  contains 
four  additional  dominated  solutions  not  displayed  in  Table  2.1.  These  suboptimal 
solutions  show  that  GASO  was  often  stuck  in  local  maxima. 

Table  2.2  presents  a  comparison  between  Y^ASO  and  Y^ASO  in  terms  of  the  metrics 
presented  at  the  beginning  of  this  section,  using  Yp  as  the  reference  set.  The  solu- 
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tion  set  produced  by  EASO  has  smaller  distance  and  higher  coverage  ratio  GASO. 
These  results  demonstrate  that  EASO  yields  a  wider  range  of,  and  potentially  better 
solutions  than  the  SOP  orchestrators  in  [10,11].  Furthermore,  and  perhaps  the  most 
distinguishing  feature  of  EASO,  is  how  well  it  enumerates  the  tradeoff  space. 

To  illustrate  this  point,  again  consider  Table  2.1,  representing  the  nondominated 
solutions  returned  by  EASO  and  GASO.  Now  suppose  a  network  operator  using 
GASO  decides  that  GASO  alloc.  #5  (equiv.  to  EASO  alloc.  #10)  is  most  appropriate 
for  his/her  needs,  because  it  offers  the  best  compromise  between  BW  and  WCS. 
However,  EASO  alloc.  #11  is  a  better  compromise,  as  it  offers  the  same  level  of 
WCS  as  GASO  alloc.  #5,  but  even  better  BW,  at  the  expense  of  power.  Moreover, 
the  diverse  EASO  solution  set  allows  an  operator  to  program  the  orchestrator  to 
automatically  select  an  allocation  based  upon  the  prevailing  network  conditions.  For 
example,  run  EASO  alloc.  #11  during  peak  hours  to  conserve  BW,  and  EASO  alloc. 
#10  during  non-peak  hours  to  save  power. 

2.5.3  Is  EASO  Scalable? 

To  evaluate  its  scalability,  we  simulated  EASO  on  a  large-scale,  multi-tier  application 
data  center  scenario  similar  to  the  one  presented  in  [4] ,  but  with  the  additional  third 
objective  of  power  conservation  (adjusted  for  various  host/ToR  power  consumption 
ratios).  Specifically,  we  ran  EASO  on  a  simulated  physical  infrastructure  consisting 
of  40  aggregation  switches,  160  ToR  switches  (4  x  ToRs  per  aggregate),  2560  hosts 
(16  x  hosts  per  ToR),  and  40960  VM  slots  (16  x  VM  slots  per  host),  for  the  following 
5-tier  application  requirements:  Tl:  <40  x  4,  10  Mbps>,  T2:  <40  x  1,  100  Mbps>, 
T3:  <40  x  2,  50  Mbps>,  T4:  <40  x  1,  100  Mbps>,  T5:  <40  x  4,  10  Mbps>.  Here 
the  first  element  in  each  tuple  represents  the  number  of  VMs  and  slots  required  per 
VM,  e.g.,  <40  x  4,  10  Mbps>  denotes  40  VMs  requiring  4  slots  and  10  Mbps  BW 
each.  The  NCFs  remain  the  same  as  presented  in  Section  2.2. 

At  this  scale,  totally  enumerating  Yp  is  intractable  [4],  Therefore,  we  constructed 
an  outer  approximation  (OA)  of  Yp  based  on  the  well  known  “constraint  method” 
found  in  MOP  literature  [17].  Specifically,  we  formulate  each  ordered  pair  of  NCF 
utility  functions,  (/1J2),  (/1J3),  (/2J1),  (/2J3),  (/3,/i),  (/3J1)  as  a  biobjective  op- 
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YsEAS0 

Ysgaso 

(L  =  25  ,K=  25, 

S=2) 

(1  =  50  ,  K  =  50, 

S  =  4) 

(L  =  75,  K  =  75, 
s  =  6) 

(fj  >  .005,  .010 . 975) 

(s  =  1,  2, ... ,  40) 

Distance  (Mean,  Min,  Max) 

(0.122,  0.0,  0.241) 

(0.094,  0.0,  0.248) 

(0.047,  0.0,  0.174) 

(0.197,  0.0,  0.244) 

Coverage 

0.030 

0.059 

0.089 

0.033 

#  of  Nondominated  Solutions 

83 

109 

197 

43 

Execution  Time  (seconds) 

69 

566 

5059 

1184 

Table  2.3:  Performance  of  three  runs  of  EASO  vs.  GASO  for  large  scenario.  Best, 
moderate,  and  worst  results  are  shaded  green,  yellow,  and  red,  respectively. 


timization  problem  (BOP)  where  the  first  utility  function  is  cast  as  a  discrete  set 
of  lower  bounds  (e.g.,  Cfx  for  fi  where  Cf1  =  {0.005,0.010, ...  ,0.975}),  and  the  sec¬ 
ond  is  maximized  for  each.  The  OA  then  consists  of  all  points  (cji;  /2,  fs)  where  f~2 
and  /3  are  separately  maximized  for  each  Cfx  eCjj,  and  so  on  for  ( fl,Cf2,fs )  and 
(/i;/2,c/3)-  Note  that  tractably  constructing  a  tight  OA  for  nonlinear  BOPs  is  a 
challenging  problem  in  and  of  itself  [33]. 

Table  2.3  illustrates  the  performance  of  EASO  with  respect  to  OA  for  different  sets 
of  input  parameters4.  Observe  that  there  is  a  clear  tradeoff  between  time  and  opti¬ 
mality.  As  the  size  of  input  parameters  ( L,K,s )  increase,  EASO  produces  better  and 
more  diverse5  solution  sets  at  the  cost  of  increased  execution  time.  The  “short-run” 
parameter  set  (25,25,2),  completes  in  just  over  a  minute,  hence  most  appropriate  for 
network  operators  with  rapidly  changing  tenant  application  requirements.  In  con¬ 
trast,  the  “long-run”  parameter  set  (75,75,6)  takes  over  an  hour  to  complete,  and 
thus  may  be  warranted  for  steady  state  data  center  operations  where  network  con¬ 
figurations  are  unlikely  to  change  frequently.  Finally,  the  “medium-run”  parameter 
set  (50,50,4)  finishes  in  under  ten  minutes,  and  represents  a  reasonable  compromise 
between  agility  and  quality.  The  increase  of  EASO  execution  time  (last  row  of  Table 
2.3)  corroborates  the  time  complexity  analysis  in  Section  2.4.3. 


4For  comparative  purposes,  we  ran  a  fine-grained  parametric  analysis  of  GASO  over  using  a 
range  of  mutation  sizes.  Note  that  GASO  performed  worse  than  the  EASO  “medium-run”  parameter 
set  in  every  category,  including  execution  time. 

5Because  the  size  of  OA  is  very  large  (843  solutions),  coverage  should  be  viewed  as  a  relative 
metric,  as  obtaining  high  absolute  coverage  values  is  not  possible  for  relatively  small  values  of  L. 
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•  EASO 


0.4  °-5 

0  6  Power  Consumption 

Mean  Link  BW  (Inverse)  0.8  0  (Inverse) 

Figure  2.4:  EASO  Nondominated  Front  vs.  Outer  Approximation  (OA)  of  Yp,  for  a 
large-scale  data  center  scenario. 

Figure  2.4  depicts  the  EASO  “long-run”  solution  set  vs.  OA  for  the  large-scale  data 
center  scenario.  From  this  figure,  we  can  see  that  the  EASO  solution  set  is  well  spread 
and  relatively  close  regarding  OA.  Also  realize  that  the  EASO  solutions  are  at  least 
as  close  to  Yp  as  OA,  since  points  in  OA  are  not  necessarily  feasible. 

2.6  Conclusion 

We  have  demonstrated  that  that  our  proposed  evolutionary  approach  can  enumerate 
a  wider  range  of,  and  potentially  better  solutions  than,  current  orchestrators  for 
relatively  large  data  center  networks. 

For  future  work,  we  find  several  areas  intriguing.  The  mutation  and  recombination 
evolutionary  primitives  may  be  further  refined  and  adapted  for  other  orchestration 
tasks,  such  as  traffic  engineering,  risk  management,  or  cybersecurity.  For  example, 
in  [16],  one  specialized  mutation  procedure  is  used  to  select  alternate  routing  paths 
between  network  services.  Fine-tuning  the  tradeoff  space  based  on  operational  re¬ 
quirements  and  automated  decision  making  with  respect  to  the  tradeoff  space  are 
other  promsining  areas,  e.g.,  how  to  enumerate  a  relevant  subset  of  the  tradeoff  space 
in  less  time,  or  how  to  select  the  best  EASO  candidate  solution  given  prevailing 
network  conditions. 
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CHAPTER  3: 

An  Application-Aware  Approach  to  Scalable  Online 
Placement  of  Data  Center  Workloads 


This  chapter  focuses  on  the  challenges  of  achieving  application-aware  resource  alloca¬ 
tions  that  not  only  provide  desirable  outcomes  in  terms  of  operator  objectives,  but  are 
also  desirable  for  minimizing  resource  contentions  both  within  and  between  applica¬ 
tion  workloads.  Moreover,  we  show  that  by  precomputing  ideal  workload  allocations 
in  a  contention-free  environment,  such  as  a  simulated  empty  topology,  the  resultant 
logical  application  workload  (LAW)  may  be  used  as  a  single  atomic  unit  for  allocating 
a  workload.  This  is  in  contrast  to  the  tens  to  hundreds  of  workload  VMs  that  would 
typically  require  individual  per  VM  allocation.  As  a  result,  not  only  are  the  corre¬ 
sponding  LAW  allocations  largely  free  of  application  contentions  vs.  existing  per  VM 
methods,  but  also  by  using  the  larger-grained  LAW  allocation  algorithms,  workload 
placement  can  be  accomplished  one  order  of  magnitude  faster  than  current  solutions, 
achieving  average  allocation  times  of  less  than  a  second  for  data  center  networks  with 
hundreds  of  switches  and  thousands  of  VMs. 

Data  center  operators  strive  for  maximum  resource  utilization  while  satisfying  tenant 
service  level  agreements;  however,  they  face  major  challenges  as  application  workload 
types  are  diverse  and  tenants  add,  remove,  and  update  their  workloads  sporadically  to 
meet  changing  user  demands.  Currently,  operators  allocate  workload  VMs  primarily 
in  an  application  agnostic  fashion,  focusing  on  minimizing  total  resource  usage.  In  this 
work,  we  first  show  that  such  allocations  can  be  suboptimal  and  then  present  a  new 
application-aware  approach  that  explicitly  models  resource  preferences  of  individual 
workloads.  Further,  we  propose  a  new  LAW  abstraction  to  enable  precomputation 
of  the  required  relative  positioning  of  an  application’s  VMs  and  allocation  of  these 
VMs  in  a  single  atomic  step,  leading  to  online  algorithms  that  are  one  order  of  magni¬ 
tude  faster  than  existing  per  VM  placement  solutions.  We  then  develop  a  statistical 
extension  of  LAW  to  add  flexibility  in  characterizing  application  requirements  and 
to  support  prioritization  of  workloads.  Using  realistic  workload  traces  and  physical 
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topologies,  we  evaluate  our  algorithms  in  a  simulated  large-scale  data  center  setting, 
and  demonstrate  their  performance  advantages  and  potential  tradeoffs  versus  existing 
solutions. 

3.1  Introduction 

The  data  center  operators  of  large  enterprises  and  public  cloud  providers  face  major 
challenges  in  allocating  virtual  machines  (VMs)  to  individual  application  workloads. 
First,  the  workloads  are  diverse  in  sizes  (measured  by  the  number  of  VMs  required, 
the  amount  of  link  bandwidth  required,  running  time,  etc.),  in  resource  consumption 
characteristics  (e.g.,  computation  intensive  vs.  data  intensive),  and  in  performance 
requirements  such  as  bounds  on  response  times  and  application  availability  [34],  Sec¬ 
ond,  the  increased  flexibility  and  agility  provided  by  software-defined  data  center 
networks  allows  enterprises  and  public  cloud  tenants  to  rapidly  add,  remove,  update, 
and  prioritize  their  workloads  to  meet  sporadically  changing  user  demands.  There 
is  an  urgent  need  for  online  allocation  algorithms  that  can  deal  with  such  dynamic 
behaviors  without  incurring  too  much  latency  [24,27].  Last,  but  not  least,  the  opera¬ 
tors  must  strive  for  efficient  uses  of  their  physical  resources  (link  bandwidth,  electrical 
power,  etc.)  in  order  to  minimize  cost  while  meeting  the  quality  of  service  require¬ 
ments  of  applications. 

Maximizing  resource  utilization  within  data  centers  is  a  problem  that  many  prior 
efforts  have  attempted  to  solve  using  various  approaches,  including  integer  linear  pro¬ 
gramming  [23],  greedy  heuristics  [3,4,10,11],  and  genetic  algorithms  [16,28].  However, 
we  observe  that  this  body  of  work  overwhelmingly  strives  to  optimize  cumulative  re¬ 
source  usage  among  all  hosted  tenant  applications  while  deferring  the  per  application 
performance  concerns  to  relatively  low  level  mechanisms  such  as  CPU  scheduling 
and  traffic  engineering,  despite  recent  studies  demonstrating  that  different  types  of 
workloads  contend  for  different  types  of  resources  and  consequently  how  VMs  of  an 
application  are  relatively  positioned  can  significantly  impact  the  performance  of  an 
application  [13, 14].  Specifically,  it  would  be  advisable  not  to  co-locate  VMs  of  com¬ 
putation  intensive  workloads  to  avoid  unnecessary  CPU  contention  while  at  the  same 
time,  position  VMs  of  the  same  data  intensive  workload  as  close  as  possible  to  reduce 
both  bandwidth  contention  and  communication  latency  [14]. 
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In  other  words,  while  workload  placement  should  concern  about  cumulative  resource 
usage,  additional  opportunities  exist  to  intelligently  place  the  VMs  of  individual  work¬ 
loads  to  improve  per  application  performance  of  all  hosted  applications.  In  this  paper, 
we  investigate  a  placement  strategy,  which  we  term  “application-aware”,  to  leverage 
such  opportunities.  More  specifically,  we  explicitly  model  the  resource  preference 
of  each  workload  and  develop  a  unified  framework  to  characterize  and  minimize  re¬ 
source  contentions  introduced  by  a  new  workload,  which  can  be  between  VMs  of  the 
same  workload  as  well  as  with  respect  to  existing  workloads.  Further,  we  enhance 
the  application-aware  approach  by  introducing  a  new  logical  application  workload 
(LAW)  abstraction.  The  LAW  represents  the  most  desirable  relative  positioning  of 
the  workload’s  VM  placement  in  a  physical  topology  in  terms  of  both  meeting  oper¬ 
ator  specific  resource  efficiency  goals  and  minimizing  resource  contentions.  We  show 
that  a  data  center  controller  can  precompute  LAWs  and  subsequently  assign  VMs  of 
each  workload  in  a  single  atomic  step,  leading  to  online  algorithms  that  are  one  order 
of  magnitude  faster  than  current  per  VM  placement  solutions. 

Our  contribution  is  multi-fold  as  follows. 

1.  First,  we  show  that  application  agnostic  workload  placement  can  introduce  un¬ 
necessary  resource  contentions,  and  demonstrate  potential  performance  gains  from 
precisely  modeling  resource  contentions  that  can  be  introduced  by  a  workload. 
(Section  3.2) 

2.  Second,  we  formally  define  LAW  and  describe  how  to  construct  LAWs  that  mini¬ 
mize  potential  resource  contentions.  We  then  design  a  range  of  bin  packing  heuris¬ 
tics  to  place  workloads  on  a  per  LAW  basis  and  at  the  same  time  optimize  different 
cumulative  resource  usage  objectives.  (Section  3.3) 

3.  Third,  we  observe  that  LAW  based  workload  placement  can  be  infeasible  sooner 
than  per  VM  placement.  Consequently,  we  propose  a  statistical  extension  of  the 
LAW  abstraction  to  add  flexibility  in  characterizing  application  requirements.  We 
show  that  the  extension  permits  a  data  center  operator  to  increase  LAW  placement 
feasibility  via  graceful  degradation  of  application  performance.  (Section  3.4) 

4.  Fourth,  we  show  that  the  statistical  LAW  extension  provides  a  natural  mechanism 
to  support  prioritization  of  workloads,  conceptually  similar  to  the  Random  Early 
Detection  (RED)  queueing  [35].  (Section  3.5) 
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5.  Fifth,  using  a  simulated  scenario  with  realistic  workload  traces  and  a  relatively 
large  data  center  topology,  we  evaluate  the  LAW  based  workload  placement  heuris¬ 
tics  and  demonstrate  their  performance  advantages  and  potential  tradeoffs  versus 
existing  solutions.  (Sections  3.4,  3.5,  and  3.6) 


3.2  Application- Aware  Placement 

We  motivate  and  illustrate  the  advantages  of  application-aware  placement  with  a 
simplistic  but  telling  scenario.  We  also  describe  how  to  model  a  workload’s  resource 
preference  and  present  metrics  to  characterize  resource  contentions  introduced  by  a 
workload. 

First,  from  prior  work  we  observe  that  there  are  two  basic  types  of  application  work¬ 
loads:  1)  CPU  or  memory  intensive  workloads  (e.g.,  meteorological,  geological,  and 
particle  physics  simulations,  or  other  high  performance  computing  tasks),  commonly 
referred  to  as  compute-intensive  (Cl)  workloads,  and  2)  network  or  storage  inten¬ 
sive  workloads  (e.g.,  client-server  or  Hadoop  and  other  big  data  applications),  com¬ 
monly  referred  to  as  data-intensive  (DI)  applications.  Generally,  state-of-the-art  re¬ 
search  [13, 14]  concludes  that  Cl  workloads  perform  better  when  the  processes  and 
VMs  are  spread  across  separate  CPU  cores  and  hosts,  respectively,  while  DI  work¬ 
loads  perform  better  when  the  VMs  (or  processes)  are  placed  on  the  same  host  (or 
CPU  core).  For  example,  by  placing  all  VMs  of  a  client-server  (DI)  application  on  the 
same  physical  host,  tenfold  increases  in  throughput  have  been  observed  [13].  In  con¬ 
trast,  over-contention  of  CPU  resources  by  VMs  of  Cl  applications  have  been  shown 
to  increase  job  completion  time  by  as  much  as  260%  [13]. 

Therefore,  we  propose  to  extend  the  current  workload  representations,  which  primar¬ 
ily  specify  the  number  of  VMs  and  intra-VM  bandwidth  [10,11,30],  to  also  include 
a  designation  of  application  type  (Cl  or  DI).  Such  a  classification  may  be  specified 
a  priori  by  tenants,  or  determined  post-allocation  by  the  operator  upon  monitoring 
workload  resource  usage  in  practice.  In  addition,  we  believe  it  is  advantageous  for 
a  data  center  to  model  the  survivability  requirement  (i.e.,  resiliency  against  single 
top-of-rack  (ToR)  switch  failures)  on  a  per  application  basis  because  typically,  differ- 


40 


Mean  Link  BW 
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Max  Link  BW 
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CPU  Contention 
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(a)  Application  Agnostic  Allocation 
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Max  Link  BW 
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(CRC):  0.111 


(b)  Application  Aware  Allocation 

Figure  3.1:  An  example  contrasting  application-aware  vs.  application  agnostic  VM 
allocations.  Green  slots  are  for  VMs  of  Rl,  red  R2,  brown  R3,  and  purple  R4.  “C” 
and  “D”  denote  a  Cl  and  DI  workload,  respectively.  The  allocation  on  the  right 
considers  application  type  (Cl  or  DI)  of  each  workload,  and  thus  attempts  to  spread 
VMs  of  Rl  and  R3  (Cl)  to  minimize  CPU  contention,  while  consolidating  VMs  of  R2 
and  R4  (DI)  to  minimize  communication  overhead. 


ent  applications  have  different  levels  of  importance  to  the  tenants.  Commonly,  the 
survivability  requirement  is  represented  by  a  metric  called  worst-case  survivability 
(WCS),  which  is  the  maximal  possible  fraction  of  VMs  taken  offline  due  to  a  single 
ToR  failure  [5]. 

Now  consider  the  following  simplistic  scenario.  Suppose  four  independent  tenant  ap¬ 
plication  workloads  Rl,  R2,  R3,  R4,  arrive  in  sequential  order  and  have  requirements 
of  <CI,  5,  10  Mbps,  0.5>,  <DI,  5,  50  Mbps,  0.5>,  <CI,  5,  10  Mbps,  0.5>,  and  <DI, 
8,  50  Mbps,  0.5>  respectively.  The  first  value  in  the  tuple  represents  the  application 
type  (appType),  the  second  value  represents  the  number  of  VMs  (numVMs)  required, 


41 


the  third  value  represents  the  inter- VM  bandwidth  (BW)  reservation  requirement6, 
and  the  fourth  value  represents  the  minimum  required  WCS7.  Suppose  the  under¬ 
lying  physical  infrastructure  has  a  hierarchical  tree  topology,  consisting  of  one  core 
switch  as  root,  two  aggregation  switches,  four  ToR  switches,  four  host  servers  per 
ToR  switch,  and  two  VM  slots  per  host  server,  where  a  “VM  slot”  is  defined  as  a 
standard  physical  resource  unit  (e.g.,  CPU,  Memory)  provisioned  to  a  VM.  Therefore, 
the  VMs  are  to  be  allocated  against  4  x  4  x  2  =  32  possible  slots. 

Current  solutions,  such  as  [3,4,10,16],  overwhelmingly  attempt  to  maximize  resource 
utilization  by  finding  a  “best  fitted”  allocation  of  tenant  application  workloads  that 
minimizes  some  weighted  global  cost  function  associated  with  data  center  resource 
usage,  and  more  specifically,  bandwidth  usage.  In  theory,  such  approaches  seem  effec¬ 
tive  in  maximizing  global  resource  utilization.  However,  because  global  cost  functions 
are  not  capable  of  representing  fine-grained  contention  points  of  independent  tenant 
application  workloads  (e.g.,  Cl  or  DI),  the  “globally  optimal”  network  allocations 
produced  by  current  solutions  are  unlikely  to  be  optimal  for  each  tenant  application 
comprising  it.  This  situation  is  illustrated  in  Figure  3.1.  The  left  part  (3.1a)  depicts 
a  best-fitted  allocation  representative  of  current  solutions,  which  minimizes  network¬ 
wide  bandwidth  usage  while  satisfying  application  requirements  Rl,  R2,  and  R3. 
However,  note  that  Cl  applications  Rl  and  R3  are  in  contention  with  themselves  for 
CPU  and  memory  on  hosts  H2,  H5,  and  H6,  and  DI  applications  R2  and  R4  do  not 
achieve  optimal  network/storage  sharing  benefit  via  host  colocation:  R2  misses  an 
opportunity  for  colocation  under  rack  T1  and  R4  misses  a  colocation  opportunity 
under  rack  T3.  Figure  3.1b,  in  contrast,  depicts  an  application- aw  are  best-fitted  al¬ 
location,  which  not  only  minimizes  network-wide  bandwidth  usage,  but  also  provides 
optimal  conditions  for  each  tenant  application.  Although  Cl  applications  Rl  and  R3 
compete  with  each  other  on  H12,  they  do  not  compete  with  themselves  on  any  host, 
and  DI  applications  R2  and  R4  achieve  maximum  intra-host  colocation. 

The  example  in  Figure  3.1  also  shows  that  application-aware  allocation  does  not 
always  degrade  each  global  utility  metric.  Here,  the  application-aware  allocation 

6Inter-VM  bandwidth  is  reserved  according  to  the  “hose”  model,  as  done  in  [30]. 

7Prior  work  in  datacenter  network  orchestration  [10,11]  asserts  that  a  WCS  of  0.5  should  be  the 
bare  minimum  acceptable  to  the  network  operator. 
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method  achieves  less  link  congestion  (i.e.,  smaller  max  BW)  by  using  more  physical 
servers  to  host  the  VMs,  a  tradeoff  that  typically  results  in  more  power  usage. 

The  insight  that  different  types  of  tenant  workloads  have  different  host  assignment 
preferences  motivates  the  need  for  a  metric  to  capture  such  preferences.  To  this  end 
we  propose  a  new  Resource  Contention  Index  (RCI)  to  evaluate  the  placement  of 
individual  tenant  applications  (Def.  6). 

Def.  6.  (RCI  for  placement  of  workload  w) 


0 


RCI(w)  =  { 


Host_Af  finityiw) 

[  1  —  Host_Af  finity{w) 


workload  requires  1  VM 
workload  is  Cl 
workload  is  DI 


where: 


Host_Affinity(w) 


(#of  hosts  hosting  VMs  of  w)  —  1 
(total  VMs  allocated  for  w)  —  1 


and  Host_Af  finityQ  is  defined  only  when  the  total  number  of  VMs  allocated  for  w 
is  greater  than  1. 


The  host  affinity  for  some  tenant  application  represents  the  degree  by  which  its  VMs 
are  intra-host  colocated.  A  value  of  ‘O’  indicates  that  no  application  VMs  share 
the  same  host,  and  thus  minimizes  intra-application  computational  contention  for 
a  Cl  workload,  whereas  a  value  of  ‘1’  indicates  that  all  application  VMs  share  the 
same  host,  hence  maximizing  intra-application  data  throughput  for  a  DI  workload. 
RCI  captures  this  fundamental  difference  in  application  preference  between  Cl  and 
DI  workloads  in  a  single  application-specific  metric.  Furthermore,  we  can  take  the 
average  (mean)  RCI  across  all  allocated  workloads  to  provide  a  measure  of  global 
application  efficiency.  Note  that  achieving  an  RCI  of  zero  for  all  workloads  is  not 
realistic  because  tenants  may  have  competing  objectives  or  constraints.  Survivability 
for  instance,  is  directly  at  odds  with  RCI  for  DI  workloads,  as  a  host  affinity  of  ‘V 
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implies  WCS  =  0.  But  having  zero  survivability  is  not  acceptable  for  most  tenants. 
As  another  example,  consider  tenant  budget  constraints  for  a  Cl  workload.  It  may 
be  cost  prohibitive  for  a  tenant  to  place  each  workload  VM  on  a  separate  physical 
host  or  CPU  core.  In  such  cases,  a  tenant  may  choose  to  settle  a  higher  intra-VM 
contention  ratio  (i.e. ,  RCI  >  0)  in  the  interest  of  lower  SLA  costs. 

A  useful  adaptation  of  RCI,  termed  computational  resource  contention  (CRC),  is 
presented  in  Def.  7. 

Def.  7.  (Computational  Resource  Contention  (CRC)) 


CRC  1  ^  k°s^s  hostin§  Cl  VMs)  —  1 

(total  Cl  VMs  allocated)  —  1 


(3.1) 


and  CRC  is  defined  only  when  the  total  Cl  VMs  allocated  is  greater  than  1. 


This  metric,  by  treating  all  Cl  applications  as  one  group,  represents  the  degree  of 
contention  for  computational  (CPU/Memory)  resources  throughout  the  network.  We 
argue  that  CRC  should  be  added  to  the  collection  of  global  cost  functions  used  for 
workload  placement,  as  this  metric  is  particularly  relevant  as  an  indicator  of  through¬ 
put  bottlenecking  for  Cl  applications,  similar  in  nature  and  importance  to  the  com¬ 
monly  used  link  congestion  metrics  of  mean  and  maximum  bandwidth  usage  for  DI 
applications.  For  the  example  depicted  in  Fig.  3.1,  the  application-aware  allocation 
on  the  right  achieves  a  much  lower  CRC  than  its  counterpart. 


3.3  LAWs  for  Application  Efficiency 

To  overcome  the  shortcomings  of  current,  non-application-aware  workload  placement 
approaches,  we  propose  a  new  logical  application  workload  (LAW)  abstraction,  which 
serves  as  the  fundamental  unit  of  allocation  for  an  application  workload,  as  opposed 
to  individual  VMs,  in  order  to  speed  up  the  allocation  and  explicitly  preserve  the 
relative  positioning  of  intra-application  VMs  with  respect  to  the  best  allocation  the 
operator  would  desire  to  obtain  for  the  application  on  a  simulated  empty  physical 
infrastructure. 
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3.3.1  Natures  of  LAW 


Essentially,  a  LAW  represents  the  best  possible  allocation  for  an  application  workload 
from  the  perspective  of  the  operator  with  respect  to  a  given  physical  infrastructure. 
It  is  specific  to  the  allocation  method  used  by  the  operator  to  meet  the  application’s 
resource  and  performance  requirements,  and  as  such  it  unambiguously  captures  the 
design  intent  of  the  operator. 

Formally,  we  define  a  LAW  as  follows. 

Def.  8.  (Logical  Application  Workload  (LAW))  Given  an  requirement  speci¬ 
fication  R  for  workload  w,  a  data  center  infrastructure  P  to  host  VMs  of  w,  and  a 
VM  allocation  method  M  used  by  the  operator,  a  LAW L(w,P,M,R )  is  a  hypothetical 
allocation  for  w  by  applying  M  on  an  empty  P  subject  to  requirements  R. 

When  the  physical  infrastructure  has  a  tree  topology,  a  LAW  can  be  simply  modeled 
as  the  subtree  that  contains  the  hypothetically  allocated  VMs,  with  additional  an¬ 
notations  of  required  resources.  For  the  application  requirement  tuple  considered  in 
this  paper,  i.e.,  the  four-tuple  of  (appType,  numVMs,  BW,  WCS),  each  switch  of  the 
subtree  should  be  annotated  with  two  parameters:  r_slots  for  the  total  number  of 
VMs  supported  underneath,  and  r_bw  for  the  total  amount  of  bandwidth  reservation 
required  on  its  upstream  link.  The  two  parameters  are  driven  by  the  number  of  VMs 
and  number  of  computational  resource  units  (i.e.,  slots)  required  by  each  VM,  and 
the  BW  requirements,  while  the  relative  positions  of  the  VMs  should  minimize  RCI 
while  meeting  the  WCS  bound. 

When  the  physical  resource  capacity  is  much  larger  than  what  the  workload  requires, 
many  subtrees  can  accommodate  the  LAW,  i.e.,  support  the  relative  positioning  of 
the  VMs.  In  such  a  case,  we  model  the  LAW  using  the  leftmost  subtree.  Similarly, 
before  the  physically  infrastructure  is  heavily  utilized,  allocation  of  a  LAW  should  be 
straightforward,  involving  a  small  number  of  checks  of  feasibility  against  the  r_slots 
and  r_bw  parameters.  In  other  words,  heuristics  that  allocate  a  workload  on  the 
basis  of  its  LAW  should  run  faster  than  their  per-VM  counterparts  in  most  scenarios 
and  therefore  should  be  more  suitable  as  an  online  solution. 
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Intuitively,  compared  to  a  finer-grain  per-VM  allocation,  a  LAW  based  allocation 
method  may  trade  off  some  network-wide  performance  such  as  power  efficiency  and 
bandwidth  utilization  in  order  to  maximize  the  quality  of  service  of  individual  appli¬ 
cations.  However,  the  former  has  its  own  problem  as  it  allocates  VMs  sequentially 
and  as  such  VMs  allocated  earlier  in  the  sequence  may  prevent  the  overall  allocation 
from  maximizing  global  utility.  An  investigation  of  this  interesting  tradeoff  will  be 
presented  in  Section  3.4.  In  the  rest  of  this  section,  we  focus  on  how  to  leverage  the 
concept  of  LAW  to  create  an  online  VM  allocation  solution  that  can  respond  to 
dynamic  workload  input. 

3.3.2  Constructing  Efficient  LAWs 

While  the  existing  application  agnostic  solutions  [3,4,10,10,11,16,28]  can  be  directly 
used  to  create  LAWs,  we  propose  to  extend  these  solutions  to  consider  application 
characteristics  (e.g.,  Cl  vs.  DI)  by  applying  Algorithm  3.1  post-allocation  in  order 
to  obtain  LAWs  with  the  smallest  RCI  possible  while  meeting  other  application  re¬ 
quirements.  This  is  accomplished  by  both  a)  spreading  the  Cl  VMs  of  each  rack  over 
the  maximum  number  of  available  hosts,  and  b)  concentrating  the  DI  VMs  of  each 
rack  onto  the  smallest  possible  number  of  physical  hosts.  Specifically,  as  presented 
in  Algorithm  3.2,  the  procedure  CONSTRUCT-LAW(Po, -^0  first  runs  an  existing  per 
VM  allocation  scheme  on  an  empty  physical  topology  (simulated),  and  then  calls  Al¬ 
gorithm  3.1  on  the  same  input,  which,  depending  on  the  application  type,  rearranges 
some  of  the  VMs  to  minimize  RCI  while  satisfying  other  requirements.  The  selected 
allocation  is  then  converted  into  a  LAW  by  (i)  removing  all  elements  of  the  physical 
infrastructure  tree  not  used  by  the  application,  and  (ii)  tallying  up  the  number  of 
descendant  VMs  (■ r_slots )  and  the  reserved  upstream  bandwidth  (■ r_bw )  parameters 
for  each  of  the  remaining  nodes. 

It  is  important  to  note  that  a  data  center  controller  can  pre-construct  LAWs  for 
expected  workloads8  against  available  infrastructures  and  store  the  results  (LAW 
subtrees)  in  a  hash  table.  This  way,  the  online  allocation  algorithm  (presented  next) 
will  need  minimum  time  to  obtain  the  LAW  for  a  new  workload  request. 

sFor  the  scope  of  this  paper,  store  all  combinations  of  elements  of  each  of  the  discrete  ranges  of 
numVMs,  intra-VM  BW,  and  WCS  for  both  Cl  and  DI  app  types. 
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Algorithm  3.1  “App- Aware  Allocation  Adjustment” 


1 
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procedure  App- Aware- Adjust(m;,  R ,  P) 

>  input  w:  workload  id 

>  input  R :  requirement  spec,  for  w 

>  input  P :  physical  topology;  allocated 

if  R.type  =  Cl  then  >  Minimize  host  affinity 

for  each  ToR  t  e  P  do 

Spread  VMs  of  w  across  hosts  in  t 
while  meeting  BW  and  WCS  requirements 
end  for 

else  if  R.type  =  DI  then  >  Maximize  host  affinity 

for  each  ToR  t  e  P  do 

Concentrate  VMs  of  w  onto  hosts  in  t 
while  meeting  BW  and  WCS  requirements 

end  for 
end  if 

end  procedure 


Algorithm  3.2  “LAW  Construction” 

1:  procedure  Construct-LAW(u>,P,Po) 

2:  >  input  w:  workload  id 

3:  >  input  R:  requirement  spec,  for  w 

4:  >  input  P0:  physical  topology;  empty 

5:  Per-VM-Allocation(w;,P,Po) 

6:  App-Aware-Adjust(w;,P,Po) 

7:  end  procedure 

3.3.3  Allocating  LAWs 

Allocating  a  LAW  to  a  physical  infrastructure  (which  may  or  may  not  be  empty) 
needs  to  meet  a  set  of  global  objectives  defined  by  the  network  operator.  These 
global  objectives  have  traditionally  included  minimizing  power  usage  [6]  and  link 
congestion  [16],  but  we  argue  that  the  CRC  objective  (Def.  7)  defined  in  Section  3.2 
should  also  be  considered. 

Conceivably,  an  algorithm  to  explore  the  efficient  frontier  of  LAW  allocations  with 
respect  to  the  global  objective  space  could  be  developed,  for  instance  an  algorithm 
similar  to  EASO  [28]  that  takes  a  set  of  LAWs  as  input  and  produces  a  set  of  efficient 


47 


proposed  allocations  as  output,  where  each  satisfies  all  LAWs.  However,  we  defer  this 
challenge  to  future  work,  and  instead  focus  efforts  on  the  more  fundamental,  low- 
level  problem  of  allocating  each  individual  LAW  in  a  manner  that  is  both  fast  and 
resource  efficient,  with  the  goal  of  minimizing  some  global  cost  function.  Specifically, 
we  consider  the  single-objective  cost  functions  of  1)  power  usage,  2)  link  congestion 
(i.e.,  mean  and  max  BW  usage),  and  3)  CRC. 

In  the  remainder  of  this  section,  we  describe  a  general  algorithm  for  allocating  a 
LAW  to  the  physical  infrastructure,  and  then  discuss  the  use  of  different  heuristics  for 
minimizing  the  global  cost  functions  described  previously.  Because  we  are  allocating 
LAWs,  i.e.,  sets  of  VMs  that  are  placed  with  strict  relative  positioning  requirements, 
we  cannot  directly  apply  heuristics  for  individual  VM  placement,  as  done  in  [36,37]. 

Thus,  we  propose  Algorithm  3.3,  which  takes  the  precomputed  LAW  L  for  an  appli¬ 
cation  workload  (assumed  to  satisfy  all  requirements  R  of  workload  w),  a  physical 
network  state  P  (i.e.,  physical  infrastructure  with  current  allocation),  and  a  cost¬ 
minimizing  heuristic  h  as  input,  and  attempts  to  find  a  feasible  allocation  for  the 
workload.  Specially,  we  consider  three  heuristics  in  this  paper:  (1)  “Min  Power”: 
referred  to  as  best- fit  or  tightest- fit  in  the  classic  bin  packing  problem  literature  [36] , 
it  seeks  to  map  VMs  of  L  onto  hosts  with  the  smallest  number  of  free  slots,  effectively 
minimizing  the  number  of  active  hosts;  (2)  “Min  BW”:  similar  to  the  max-rest  bin 
packing  heuristic  [36],  it  seeks  to  map  VMs  of  L  onto  hosts  with  the  largest  number  of 
free  slots,  effectively  balancing  bandwidth  allocation  and  minimizing  BW  congestion 
(i.e.,  maximum  link  BW);  (3)  “Min  CRC”:  designed  to  spread  Cl  workloads  as  evenly 
among  hosts  as  possible  while  attempting  to  place  DI  workloads  within  the  same 
subtree  as  previously  allocated  Cl  workloads  (to  allow  maximum  Cl  workload  distri¬ 
bution).  While  the  underlying  approach  of  Algorithm  3.3  is  generally  applicable  to 
tree-based  data  center  topologies  including  Fat-Tree  and  VL2,  for  ease  of  exposition, 
we  assume  that  P  has  a  simple  tree  structure. 

At  the  heart  of  LAW  based  allocation  is  to  map  each  LAW  node  (In)  to  a  unique 
physical  node  (denoted  by  In.pn )  at  the  same  tree  level  that  has  sufficient  resources 
to  support  the  requirements  of  In.r _slots  and  ln.r_bw.  After  considering  several 
different  tree  search  algorithms  to  explore  the  feasible  In  — >  In.pn  search  space,  in- 
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Algorithm  3.3  “Online  LAW  Allocation” 

1:  procedure  AhhOCATE-LAW  (L,P,h) 

2:  >  input  L :  LAW,  P :  physical  topology 

3:  >  input  h:  cost-minimizing  heuristic  method 

4:  for  each  level  l  from  “ToR”  to  “core”  do 

5:  for  each  LAW  node  In  £  L  at  level  l  do 

6:  for  each  physical  ToR  pn  G  P  at  level  l  do 

7:  if  AssiGN-MlN-CoST(Zra,pn)  <  oo  then 

8:  Save  best  matching  of  In. children, 

pn.children  for  ( ln,pn ) 

9:  end  if 

10:  end  for 

11:  if  no  saved  matchings  for  In  then 

return  false 
12:  end  if 

13:  end  for 

14:  end  for 

15:  Assign  mapping  L.root.pn  =  P.root  >  core  nodes 

16:  ME  =  edge  set  for  saved  best  matching  of 

L.  root,  children,  P.root.  children 
17:  while  M E  /  0  do 

18:  Extract  next  edge  ( ln,pn )  from  ME 

19:  Assign  mapping  In.pn  =  pn 

20:  Add  edge  set  for  saved  best  matching  of 

In.children, pn.children  to  ME 

21:  end  while 

22:  Allocate  VMs  according  to  In  — >  In.pn  mappings 

and  update  f_slots  and  f_hw  of  each  pn  accordingly. 

23:  Call  Alg.  3.1  for  application-aware  adjustment. 

24:  return  true 

25:  end  procedure 


eluding  depth-first  search,  breadth-first  search,  best-first  (A*)  search,  and  backtrack¬ 
ing  approaches,  in  the  interests  of  scalability,  we  ultimately  decided  to  model  the 
problem  of  hireling  the  “best”  set  of  feasible  In  — >  In.pn  mappings  as  a  minimization 
variant  of  the  classical  assignment  problem,  also  known  as  the  minimum  weighted 
bipartite  matching  problem  [38].  To  this  end,  Algorithm  3.3  uses  the  Hungarian 
or  Kuhn-Munkres  algorithm,  originally  proposed  by  Kuhn  [39]  and  later  refined  by 
Munkres  [40],  as  a  subroutine  in  the  Assign-Min-Cost  sub-procedure.  The  details 
are  given  below. 
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Algorithm  3.3  “Online  LAW  Allocation”  (Cont.) 

19:  procedure  AssiGN-MlN-CoST(7n,pn) 

20:  Initialize  a  bipartite  graph  G  =<Vi,Vp,E  =  0  > 

21:  where  V/  =  In.children  and  Vp  =  pn.children 

22:  for  each  pair  of  nodes  {u  £  Vi,v  £  Vp )  do 

23:  add  new  edge  <  u,v  >  to  E 

24:  if  v.f_  slots  >  u.r_slots  Apn.f_bw  >  u.r_bw 

then 

25:  h_val  =  GET-HEURISTIc(pn,/t,L) 

26:  set  edge  weight  =  u.r_slots  x  h_val 

27:  else  set  edge  weight  =  oo 

28:  end  if 

29:  end  for 

30:  Add  dummy  nodes  to  Vi  and  associated  oo  edges 

31:  until  ||Vj||  =  ||V^|| 

32:  ME  =  KUHN-MUNKRES(G) 

33:  >  ME  stores  the  minimum  weighted  edge  set 

34:  Remove  edges  with  dummy  nodes  from  ME 

35:  return  sum  of  edge  weights  of  ME 

36:  end  procedure 

37:  procedure  GET-HEURlSTlc(pn,/i,L) 

38:  if  h  =  “Min  Power”  then  return  pn.f _slots 

39:  end  if 

40:  if  h  =  “Min  BW”  then  return  pn.r_slots 

41:  end  if 

42:  if  h  =  “Min  CRC”  then 

43:  if  L.type  =  UCI"  then  return  pn.ci_slots 

44:  else  return  pn.slot_capadty  —  pn.ci_slots 

45:  end  if 

46:  end  if 

47:  end  procedure 


Checking  Feasibility 

Algorithm  3.3  determines  LAW  allocation  feasibility  by  first  checking  at  ToR  level  as 
follows.  For  each  LAW  ToR  7}  and  each  physical  ToR  Tp,  it  constructs  a  bipartite 
graph  consisting  of  hosts  of  Tj  on  one  end  and  hosts  of  Tp  on  the  other.  An  edge 
is  added  between  a  pair  of  nodes  on  the  opposite  ends  if  and  only  if  the  physical 
host  has  sufficient  VM  slots  to  support  the  LAW  host.  The  edge  weight  assigned  is 
equal  to  the  number  of  slots  required  by  the  LAW  host  (i.e.,  the  r_slots  parameter) 
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multiplied  by  a  heuristic  value  associated  with  the  the  physical  host’s  current  state: 
The  heuristic  value  equals  the  number  of  free  VM  slots  ( f_slots )  for  “Min  Power” 
and  the  number  of  reserved  slots  (■ r_slots )  for  “Min  BW”.  There  are  two  cases  for 
“Min  CRC”.  If  the  LAW  represents  a  Cl  workload,  the  heuristic  value  equals  the 
number  of  reserved  Cl  slots  ( d_slots );  otherwise,  it  equals  the  slot  capacity  less  the 
number  of  reserved  Cl  slots.  The  scaling  of  r_slots  ensures  that  when  a  set  of  LAW 
hosts  can  be  supported  by  multiple  physical  hosts,  the  LAW  host  with  the  largest 
VM  requirement  will  be  matched  with  the  physical  host  with  the  smallest  heuristic 
value,  and  so  on. 

Because  typically  7}  has  a  smaller  number  of  hosts  than  Tp  but  the  Hungarian  algo¬ 
rithm  requires  a  complete  bipartite  graph  of  equal  size  partitions  as  input,  we  modify 
the  bipartite  graph  as  done  in  [41],  by  1)  adding  special  edges  of  infinite  weight  (rep¬ 
resenting  “infeasibility”)  for  all  pairs  of  LAW  and  physical  hosts  that  are  not  yet 
connected,  and  2)  adding  dummy  nodes  to  T)  with  infeasible  (infinite  weight)  edges 
from  each  dummy  node  to  all  nodes  in  Tp.  As  such,  the  Hungarian  algorithm  al¬ 
ways  returns  a  complete  minimum  weight  matching,  but  this  matching  may  include 
some  infinite  weight  edges,  representing  “no  feasible  assignment”  [41].  After  running 
the  Hungarian  algorithm  on  the  modified  bipartite  graph,  we  remove  the  dummy 
nodes  (and  associated  edges)  from  the  returned  matching.  If  the  remaining  mapping 
still  contains  edge(s)  with  infinite  weight,  then  it  is  safe  to  to  say  that  Tp  cannot 
support  Tp 

Once  the  feasibility  between  each  pair  of  LAW  and  physical  ToR’s  is  determined, 
the  same  process  repeats  for  nodes  upward  the  LAW  hierarchy  (e.g.,  first  the  two 
aggregation  switches,  and  then  the  core  switch  for  the  example  scenario  in  Figure  3.1), 
until  (i)  the  LAW  root  (core)  is  reached,  at  which  point  the  LAW  is  determined  to 
be  feasible,  or  (ii)  if,  for  some  intermediate  node  of  L ,  no  feasible  mappings  exist 
between  it  and  a  physical  counterpart,  then  ALLOCATE- LAW  returns  false  and  LAW 
L  is  determined  to  be  infeasible  for  P. 

ALLOCATE- LAW (T,  P,  h)  returns  true  if  and  only  if  the  Kuhn-Munkres  algorithm 
returns  a  finite  sum  of  edge  weights.  At  each  level  of  feasibility  checking,  a  finite  edge 
weight  is  assigned  to  a  prospective  In  — >  In.pn  mapping  if  and  only  if  the  mapping 
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is  feasible,  i.e.,  the  physical  node  In.pn  has  sufficient  VM  slots  and  available  uplink 
bandwidth  to  meet  the  requirements  of  the  LAW  node  In.  And  because  the  Kuhn- 
Munkres  algorithm  returns  the  set  of  edges  comprising  a  minimum- weight  matching, 
the  sum  of  the  edge  weights  returned  by  the  Kuhn-Munkres  algorithm  is  finite  if 
and  only  if  there  exists  a  feasible  In  — >  In.pn  mapping  for  each  LAW  node.  Thus, 
Allocate- LAW (L,  P,  h)  returns  true  if  and  only  if  there  exists  a  feasible  mapping 
for  each  LAW  node. 

General  LAW  Allocation 

Once  a  LAW  L  is  determined  to  be  feasible  as  described  previously  (lines  4  to  11), 
Algorithm  3.3  proceeds  to  determine  the  best  In  — y  In.pn  mappings  using  a  top-down 
approach,  from  the  LAW  root  (core)  down  to  the  LAW  leaf  nodes  (hosts),  selecting 
the  best  LAW-to-physical  mappings  at  each  level  of  the  hierarchy  based  upon  the 
values  of  the  saved  matchings  that  minimize  the  chosen  heuristic  (lines  15  to  20). 
Next,  after  the  ToR  mappings  have  been  determined,  the  LAW  VMs  are  allocated 
according  to  their  respective  In  — y  In.pn  mappings  (line  22).  Finally,  VM  placements 
are  adjusted  to  ideal  placement  for  application  objectives  by  calling  Algorithm  3.1. 

Complexity  Analysis 

Space  Complexity  =  0(|L|  -C2),  where  C  represents  the  maximum  number  of  chil¬ 
dren  of  any  node  pn  G  P.  Space  complexity  is  dominated  by  one  of  two  factors:  1) 
the  number  of  edges  in  the  largest  bipartite  graph  G,  which  may  contain  up  to  G2 
edges,  and  2)  the  total  number  of  edges  maintained  in  the  saved  matchings.  Because 
each  of  the  0(|L|)  internal  LAW  nodes  may  contain  up  to  G  matchings  of  size  G,  the 
total  number  of  edges  maintained  in  the  saved  matchings  is  0(\L\  ■  C2).  Thus,  the 
resultant  space  complexity  is  0(C 2  +  \L\  ■ C 2)  =  0(\L\  ■ C 2). 

Time  Complexity  =  0(|L|  •  G4).  Time  complexity  is  dominated  by  the  execution  time 
of  the  Kuhn-Munkres  algorithm,  which  runs  in  0(G3)  and  executes  up  to  G  times 
for  each  of  the  0(\L\)  internal  LAW  nodes,  thus  yielding  a  resultant  time  complexity 
of  0(|L|-G4). 
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LAW  Allocation  Example 

We  conclude  this  section  by  illustrating  the  execution  of  Algorithm  3.3  for  each  heuris¬ 
tic  method  to  allocate  workloads  in  the  example  scenario  presented  in  Section  3.2. 
First,  ideal  LAWs  are  constructed  for  each  workload,  depicted  in  Figure  3.2,  using 
Algorithm  3.2.  Next,  Algorithm  3.3  allocates  the  workloads  in  the  order  they  arrive 
(Rl,  R2,  R3,  R4),  resulting  in  the  allocations  shown  in  Figure  3.3,  depending  on  the 
chosen  allocation  heuristic. 


(a)  LAW  Rl,  R3 


(b)  LAW  R2  (c)  LAW  R4 

Figure  3.2:  LAWs  for  workloads  R1-R4  used  in  the  example  scenario  of  Fig.  3.1.  The 
r_slots  and  r_bw  annotations  are  omitted  for  simplicity. 


Note  that  LAW  R4  is  not  feasible  using  the  “Min  CRC”  heuristic  method,  since  the 
resultant  network  state  after  allocating  LAWs  R1-R3  does  not  permit  any  feasible 
allocation  of  LAW  R4.  Therefore,  mapping  an  entire  LAW  subtree  to  the  physical 
infrastructure  in  a  single  atomic  step  appears  to  incur  a  tradeoff  between  performance 
gain  and  an  increased  likelihood  of  infeasibility. 

Observation  on  Infeasibility 

Although  not  necessarily  intuitive,  the  reason  why  “Min  CRC”  encounters  infeasibility 
sooner  than  the  other  heuristic  methods  is  relatively  straightforward.  Because  “Min 
CRC”  explicitly  seeks  the  maximum  spread  of  Cl  workload  VMs  across  ToRs  and 
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(a)  “Min  Power” 


(b)  “Min  BW”  (c)  “Min  CRC” 

Figure  3.3:  LAW  allocations  using  “Min  Power,”  “Min  BW,”  and  “Min  CRC”  heuris¬ 
tics  for  example  scenario. 

hosts  within  ToRs,  as  more  Cl  LAWs  are  allocated,  it  becomes  increasingly  difficult 
to  allocate  DI  LAWs,  which  require  the  use  of  a  large  number  VM  slots  on  a  single 
host  in  order  to  achieve  minimum  RCI.  In  the  case  of  the  example  scenario,  LAW  R4 
could  not  be  feasibly  allocated  given  the  allocations  of  LAWs  R1-R3  using  “Min  CRC” 
(Figure  3.3c).  Thus,  in  order  to  allocate  workload  R4,  either  the  use  of  a  per  VM 
allocation  method,  such  as  GASO,  or  some  method  for  deallocating  and  reallocating 
previous  LAWs  to  make  room  for  the  next  one,  e.g.,  a  backtracking  approach,  is 
required.  In  the  next  section,  we  will  investigate  this  performance  vs.  infeasibility 
tradeoff  further  using  a  large-scale  data  center  scenario,  and  subsequently  propose 
our  Statistical  LAW  solution  to  this  infeasibility  challenge  in  Section  3.5. 

3.3.4  Applying  LAWs  to  Different  Topologies 

In  this  section  we  describe  how  to  apply  LAWs  to  different  types  of  physical  network 
topologies  like  fat-tree  [12],  VL2  [42],  and  BCube  [43].  Because  LAWs  are  topology- 
dependent,  by  definition,  it  is  important  to  understand  how  LAW  construction  and 
allocation  may  be  performed  for  different  types  of  physical  topologies.  Fat-tree  and 
VL2  are  both  multi-rooted  hierarchical  trees  with  redundant  links  connecting  devices 
between  levels  of  the  tree  hierarchy.  BCube  is  a  non- hierarchical  topology  that  uses  a 
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number  of  vertically  arranged  switches  in  addition  to  the  ToR  switches,  which  provide 
end-to-end  server  communication  via  server-to-switch  network  links  exclusively,  i.e. 
there  are  no  physical  connections  between  switches. 

Although  the  algorithms  presented  in  this  chapter  assume  a  single-rooted,  simple  hi¬ 
erarchical  tree  topology  about  which  to  construct  and  allocate  LAWs,  applying  the 
LAW  Construction  (Alg.  3.2))  and  LAW  Allocation  (Alg.  3.3)  algorithms  to  han¬ 
dle  different  types  of  topologies  is  relatively  straightforward.  First,  Algorithm  3.2  is 
topology  agnostic,  and  hence  applicable  “as-is”  for  constructing  LAWs  for  any  arbi¬ 
trary  data  center  topology,  since  it  takes  the  physical  topology  as  an  input  parameter, 
and  subsequently  uses  an  operator-specified  per  VM  allocation  method  assumed  to 
place  the  VMs  onto  the  topology  in  a  desirable  fashion. 

Second,  although  Algorithm  3.3  is  topology  dependent,  it  can  be  extended  to  han¬ 
dle  multi-rooted,  redundant  link  topologies  with  the  addition  of  a  single  “dummy” 
root  node  of  which  each  of  the  core  switches  comprising  the  multi-rooted  topology 
are  made  children  of  the  dummy  root  using  dummy  network  links  of  capacity  =  0. 
Redundant  links  are  accounted  for  by  way  of  setting  multiple  “uplink  BW”  variables 
for  devices  with  multiple  available  upstream  connections.  When  placing  a  workload 
across  devices  with  several  available  uplinks  of  sufficient  capacity,  a  method  of  select¬ 
ing  one  must  be  chosen.  Link  selection  heuristics  such  as  “Min  Congestion”  or  “Min 
Available  BW”  may  be  appropriate.  By  way  of  these  implementation  adjustments, 
we  can  soundly  perform  LAW  allocation  for  multi-rooted  and  redundant  hierarchical 
network  topologies  using  Algorithm  3.3. 

For  non-hierarchical  data  center  topologies  such  as  BCube,  Algorithm  3.3  can  be 
extended  by  first  using  the  dummy  root  strategy  described  above,  and  then  by  making 
all  non-hierarchical  component  switches  children  of  the  root  using  dummy  network 
links  of  capacity  =  0.  However,  although  this  approach  remains  sound,  it  may  not 
be  very  well  suited  for  large  non-hierarchical  data  center  networks,  as  the  value  of 
C  (maximum  number  of  children  of  any  node),  will  likely  be  large  clue  to  the  high 
number  of  dummy  root  children  created  when  attempting  to  adapt  this  approach  to 
handle  a  non-hierarchical  network. 
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3.4  Evaluation 


In  this  section,  we  evaluate  the  performance  of  proposed  LAW  based  placement  heuris¬ 
tics  against  existing  solutions  in  a  simulated  (relatively)  large-scale  data  center  en¬ 
vironment  comparable  to  ones  used  in  related  work  [4].  The  topology  consists  of  40 
aggregation  switches,  160  ToR  switches  (4  x  ToRs  per  aggregate),  2560  hosts  (16  x 
hosts  per  ToR),  and  40960  VM  slots  (16  x  VM  slots  per  host). 

3.4.1  Setup 

Representation  of  Existing  Solutions 

We  choose  to  adapt  the  GASO  allocation  algorithm  presented  in  [28]  to  represent 
existing  solutions.  GASO  is  chosen  because  it  is  extensible  to  an  arbitrary  number 
of  disparate  objectives  and  customizable  by  the  operator  via  weightings  for  each 
objective,  similar  to  [10,11].  For  our  purpose,  the  global  cost  function  is  a  weighted 
sum  of  power  consumption  and  mean  link  bandwidth  usage.  By  default,  GASO 
uses  a  greedy  search  for  solutions  that  will  minimize  the  global  cost,  and  as  such 
may  converge  prematurely  to  local  minima  [28].  Therefore,  we  enhance  GASO  by 
implementing  a  genetic  algorithm  to  diversify  candidate  solutions  as  done  in  related 
work  [16,28].  We  set  the  candidate  solution  population  size  to  10,  and  perform  up  to 
25  evolutions  for  each  workload  request;  these  parameters  are  shown  to  be  required 
for  the  size  of  our  topology  [16,28]. 

Per  VM  Application- Aware  Solutions 

To  make  our  evaluation  more  complete,  we  seek  to  understand  how  the  performance 
of  LAW  based  heuristics  compare  to  that  of  solution  that  is  application-aware  but 
places  workloads  one  VM  at  a  time.  We  call  the  latter  a  per  VM  application-aware 
solution.  We  have  developed  such  a  solution  by  further  enhancing  GASO.  Therefore, 
we  evaluate  two  distinct  implementations  of  GASO:  1)  The  “default”  version,  which 
weighs  power  and  BW  usage  highest,  and  2)  an  “application-aware”  version,  which 
weighs  RCI  and  CRC  highest.  Both  versions  of  GASO  we  use  the  power  and  BW  usage 
reduction  heuristics  from  [28].  The  application-aware  version  additionally  uses  an 
RCI  and  CRC  reduction  heuristic  that  executes  Algorithm  3.1  to  move  (if  necessary) 
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application  VMs  between  hosts  in  each  rack  to  reduce  RCI  and  CRC.  Since  the 
purpose  of  the  default  version  of  GASO  is  to  model  non-application-aware  solutions, 
it  does  not  use  this  application-aware  heuristic. 

In  addition,  we  refine  the  LAW  based  heuristics  so  that  when  it  is  infeasible  to  allocate 
a  LAW  in  one  atomic  step,  they  resort  to  per  VM  allocation  for  that  workload. 

Workload  Traces 

The  workloads  are  randomly  generated  and  their  size  distributions  are  comparable  to 
what  used  in  related  work  [4],  Specifically,  four  types  of  workloads  are  used  to  model 
heterogeneous  [4]  compute- intensive  and  network-intensive  resource  requirements: 

Type  1:  <CI,  numVMs  x  4  slots,  5  Mbps,  0.5  WCS> 

Type  2:  <CI,  numVMs  x  2  slots,  10  Mbps,  0.7  WCS> 

Type  3:  <DI,  numVMs  x  1  slot,  60  Mbps,  0.5  WCS> 

Type  4:  <DI,  numVMs  x  2  slots,  30  Mbps,  0.7  WCS> 

where  numVMs  ranges  between  40  and  200,  and  also  includes  the  number  of  slots 
per  VM  to  represent  heterogeneous  workload  requirements,  as  done  in  [4], 

In  each  run,  the  workload  trace  is  produced  by  selecting  one  of  the  workload  types 
(1-4)  and  a  value  for  numVMs  (40-200)  uniformly  at  random,  and  deducting  the 
number  of  slots  required  of  the  workload  from  the  total  number  of  slots  available  in 
the  physical  infrastructure.  Each  randomly  generated  workload  is  added  to  the  trace 
until  the  next  generated  workload  exceeds  the  infrastructure  slot  capacity.  After 
the  workload  trace  is  generated,  each  heuristic  under  evaluation  attempts  to  place 
workloads  of  the  trace  in  the  order  that  they  were  generated,  i.e.,  online,  to  the 
simulated  infrastructure. 

Performance  Metrics 

Our  evaluation  focuses  on  the  following  metrics:  (1)  power  usage,  2)  BW  usage  (mean 
link  BW),  3)  link  congestion  (maximum  link  BW),  4)  mean  RCI,  5)  CRC,  and  6) 
execution  time.  We  also  seek  to  understand  the  extent  of  LAW  allocation  infeasibility 
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tradeoff.  Specifically,  for  each  run  we  identify  the  fraction  of  VM  capacity  allocated 
when  the  first  infeasible  LAW  occurs. 

3.4.2  Results 

Here,  we  present  the  results  of  LAW  and  GASO  allocation  methods  over  an  average  of 
ten  workload  traces  by  depicting  their  performance  in  power  consumption  (Fig.  3.4), 
BW  usage  (Fig.  3.5),  link  congestion  (Fig.  3.6),  CRC  (Fig.  3.7),  RCI  (Fig.  3.8),  and 
execution  time9  (Fig.  3.9).  Plotted  points  in  each  figure  represent  the  average  objec¬ 
tive  metric  values  of  a  feasible  state  for  a  given  fraction  of  VM  capacity  allocated  (i.e., 
physical  infrastructure  utilization)  over  a  series  of  ten  runs  using  different  allocation 
methods.  The  vertical  lines  are  color  coordinated  to  match  the  allocation  methods 
and  each  represents  the  average  capacity  allocated  when  the  first  infeasibility  occurred 
for  the  corresponding  allocation  method. 


Figure  3.4:  Power  usage  vs.  capacity  allocated. 

Based  on  these  results,  we  make  several  observations.  First,  observe  that  the  “Min 
Power”  LAW  allocation  heuristic  dominates  the  default  GASO  per  VM  allocation 
method.  The  “Min  Power”  LAW  allocation  heuristic  offers  similar  power  conserva- 

9The  simulation  consists  of  approximately  3000  lines  of  Java  code,  and  was  run  (serially)  using 
64-bit  JVM.  The  host  PC  was  running  64-bit  Windows  on  an  Intel  2.4  GHz  quad-core  processor 
with  24  GB  of  RAM. 


Figure  3.5:  Mean  BW  usage  vs.  capacity  allocated. 


Figure  3.6:  Max.  link  BW  usage  vs.  capacity  allocated. 


tion,  BW  usage,  and  CRC  as  default  GASO,  but  additionally  provides  much  lower 
RCI,  with  an  execution  time  that  is  an  order  of  magnitude  faster.  Thus,  based  on 
this  comparison  it  is  clear  that  application-aware  allocation  for  this  scenario  may 
be  strictly  superior  to  non-application-aware  allocation,  i.e.,  tenant  application  ob¬ 
jectives  may  be  achieved  nearly  “for  free,”  with  little  to  no  degradation  of  operator 
objectives.  Next,  observe  the  straightforward  tradeoff  between  power  usage  and  CRC. 

Default  GASO  and  “Min  Power”  maintain  relatively  high  CRC  in  order  to  minimize 
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Figure  3.7:  CRC  vs.  capacity  allocated. 


Figure  3.8:  Mean  RCI  vs.  capacity  allocated. 

power  usage,  whereas  App-Aware  GASO,  “Min  BW,”  and  “Min  CRC”  tradeoff  higher 
power  consumption  for  reduced  CRC.  Furthermore,  although  App-Aware  GASO  is 
designed  to  jointly  minimize  RCI  and  CRC,  because  it  is  bounded  by  the  parameters 
and  limitations  of  a  genetic  algorithm  to  perform  its  search  (e.g.,  population  size, 
number  of  evolutions,  etc),  it  does  not  achieve  ideal  Mean  RCI  as  the  physical  infras¬ 
tructure  becomes  utilized  when  compared  to  the  LAW  allocation  heuristics,  which 
each  explicitly  preserve  the  RCI  for  each  constructed  LAW  when  making  allocations. 
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Figure  3.9:  Execution  time  vs.  capacity  allocated. 


This  RCI  gap  between  GASO  App-Aware  and  LAW  becomes  more  pronounced  as  the 
infrastructure  becomes  more  utilized.  At  30%  utilization,  RCI  for  GASO  App-Aware 
is  only  slighty  higher  than  LAW,  but  at  60%  utilization,  RCI  for  GASO  App-Aware 
is  more  than  double  that  of  LAW. 

Tradeoffs  involving  bandwidth  are  less  straightforward.  LAW  allocation  preserves  the 
provisioned  BW  for  each  constructed  LAW,  and  both  GASO  variants  include  mean 
link  BW  as  an  optimization  criterion,  so  differences  in  BW  usage  (Fig.  3.5)  are  rel¬ 
atively  insignificant.  The  maximum  link  BW  plot  (Fig.  3.6)  provides  an  indicator 
of  link  congestion.  Observe  that  the  “Min  BW”  allocation  heuristic  provides  signifi¬ 
cantly  lower  link  congestion  than  the  other  allocation  methods  until  the  infrastructure 
becomes  about  50%  utilized. 

Regarding  feasibility,  observe  that  each  LAW  allocation  heuristic  encounters  infeasi¬ 
bility  at  widely  different  levels  of  physical  infrastructure  utilization.  The  reason  for 
this  is  complex,  as  several  factors,  including  workload  size,  workload  trace  composi¬ 
tion  (Cl  :  DI  ratio),  physical  topology  tree  structure,  physical  topology  utilization, 
and  heuristic  allocation  method,  determine  the  likelihood  of  encountering  LAW  in¬ 
feasibility.  Moreover,  in  this  scenario  there  is  clear  trend  indicating  that  allocation 
methods  that  more  aggressively  attempt  to  spread  Cl  workloads,  i.e. ,  reduce  CRC, 
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are  more  likely  to  encounter  infeasibility  sooner.  “Min  Power”  prefers  maximum  con¬ 
centration  of  all  workloads  and  has  the  highest  feasibility  rating  (83.6%).  “Min  BW” 
prefers  maximum  distribution  of  all  workloads  and  has  moderate  feasibility  (60.7%). 
“Min  CRC”  explicitly  spreads  Cl  workloads  and  has  the  least  feasibility  (30.1%). 

Finally,  observe  that  “Min  CRC,”  while  having  relatively  low  feasibility,  appears  to 
dominate  GASO  App-Aware  in  this  scenario,  by  achieving  lower  BW  usage,  link 
congestion,  CRC,  and  RCI  while  using  generally  the  same  amount  of  power.  Fur¬ 
thermore,  regarding  algorithm  execution  time  (Fig.  3.9),  LAW  allocation  (hundreds 
of  milliseconds)  is  an  order  of  magnitude  faster  than  the  GASO  variants  (tens  of 
seconds) . 

3.4.3  LAW  for  Architectures  of  Different  Dimensions 

In  this  section,  we  look  at  how  varying  the  physical  dimensions  of  hierarchical  net¬ 
work  architectures  affect  allocation  feasibility  of  different  LAW  placement  heuristics. 
Here,  physical  dimensions  are  defined  in  terms  the  four-tuple  (A,T,H,S),  where  A 
represents  the  number  of  aggregate  switches  under  each  core  switch,  T  represents  the 
number  of  ToR  switches  under  each  aggregate  switch,  H  represents  the  number  of 
host  servers  under  each  ToR  switch,  and  S  represents  the  number  of  VM  slots  per  host 
server.  Regarding  different  values  for  these  dimensions,  we  are  particularly  concerned 
with  how  LAW  feasibility  is  affected  as  the  network  architecture  is  “scaled-out,”  or 
made  more  horizontal  by  increasing  the  number  of  physical  devices  (i.e. ,  increasing 
the  values  of  A,  T ,  and  H),  as  well  as  “scaled-up,”  or  made  more  vertical  by  increasing 
the  capacity  of  each  physical  host  server  (i.e.,  increasing  the  value  of  S). 

In  order  to  provide  a  fair  LAW  feasibility  comparison  across  architectures  of  varying 
physical  proportions,  the  total  VM  slot  capacity  for  each  architecture  is  held  constant 
at  10240  slots.  Thus,  although  each  architecture  evaluated  in  this  section  has  different 
values  for  ( A,T,H,S ),  they  all  have  the  same  slot  capacity,  i.e.,  for  each  architecture, 
A-T ■  H ■  S  =  10240.  The  workload  traces  used  are  the  same  as  before  (Section  3.4.1). 

Table  3.1  illustrates  the  feasibility  results  for  each  of  the  “Min  Power,”  “Min  BW,” 
and  “Alin  CRC”  LAW  placement  heuristics  across  a  range  of  hierarchical  network 


62 


architectures  with  different  dimensions  and  a  10240  VM  slot  capacity.  The  range  of 
dimensions  in  this  table  represents  a  spectrum  of  hierarchical  network  architectures 
that  are  scaled-up  and/or  scaled-out  to  varying  degrees.  The  rows  closer  to  the  top 
of  the  table  represent  network  architectures  that  are  proportionally  scaled-up,  while 
those  closer  to  the  bottom  represent  more  scaled-out  architectures. 


Allocation  Heuristic  / 
Physical  Dimensions 

Min  Power 

Min  BW 

Min  CRC 

(5,  4,  16,  32) 

0.44 

0.18 

0.27 

(5,  4,  32,  16) 

0.58 

0.38 

0.28 

(5,  8,  16,  16) 

0.72 

0.33 

0.33 

(10,  4,  16,  16)* 

0.70 

0.48 

0.30 

(10,  8,  8,  16) 

0.72 

0.48 

0.38 

(10,  8,  16,  8) 

0.77 

0.50 

0.56 

(10,  16,  16,  4) 

0.81 

0.50 

0.63 

(20,  8,  16,  4) 

0.82 

0.50 

0.66 

Table  3.1:  LAW  feasibility  results  for  “Min  Power,”  “Min  BW,”  and  “Min  CRC”  for 
different  10240  VM  slot  infrastructures  of  varying  dimensions.  The  values  denote  the 
average  physical  infrastructure  utilization  (fraction  of  VM  capacity  allocated)  when 
the  corresponding  allocation  heuristic  and  LAW  type  first  encounter  infeasibility. 
Poor  results  (less  than  50%)  are  shaded  red,  moderate  (at  least  50%  but  less  than 
80%)  yellow,  and  good  (at  least  80%)  green.  *  reference  dimensions 

From  these  results,  we  observe  two  clear  trends,  using  the  (10,4,16,16)  dimension  ar¬ 
chitecture  as  a  reference  point.  First,  as  an  architecture  is  scaled  “out”  and  “down” 
from  (10,4,16,16)  to  (20,8,16,4)  i.e. ,  approaches  dimensions  closer  to  the  bottom  of 
Table  3.1,  LAW  feasibility  more  than  doubles  for  “Min  CRC”  (120%  increase),  and 
increases  significantly  for  “Min  Power”  (17%  increase),  but  only  increases  marginally 
for  “Min  BW”  (4%  increase).  Intuitively,  these  results  should  be  expected:  by  increas¬ 
ing  the  number  of  switches  and  servers,  there  is  more  room  to  spread  Cl  workloads 
throughout  the  infrastructure,  resulting  in  much  higher  feasibility  for  “Min  CRC,” 
but  less  so  for  the  other  heuristics,  as  their  allocation  feasibilities  are  not  as  dependent 
as  “Min  CRC”  on  the  ability  of  the  infrastructure  to  handle  distributed  Cl  workloads. 
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Second,  as  an  architecture  is  scaled  “up”  and  “in”  from  (10,4,16,16)  to  (5,4,16,32), 
i.e.,  approaches  dimensions  closer  to  the  top  of  Table  3.1,  LAW  feasibility  drastically 
decreases  for  all  heuristics  due  to  the  inability  to  spread  Cl  workloads  throughout  the 
infrastructure.  Although  allocation  feasibilities  for  “Alin  Power”  and  “Min  BW”  are 
generally  not  as  infrastructure-sensitive  as  “Min  CRC,”  as  the  network  architecture 
approaches  more  vertical  physical  topologies  like  (5,4,16,32),  the  resultant  inability 
to  spread  Cl  workloads  begins  to  hinder  allocation  feasibility  regardless  of  heuristic 
type  (less  than  50%  feasibility  across  all  heuristic  types  in  this  example,  as  illustrated 
by  the  first  row  of  Table  3.1). 

Therefore,  based  on  these  results,  we  conclude  that  scaled-out  architectures  are  gener¬ 
ally  better  for  achieving  higher  LAW  feasibility  versus  scaled-up  architectures.  How¬ 
ever,  this  problem  of  finding  the  physical  network  architecture  that  yields  the  highest 
LAW  feasibility  for  a  given  workload  forecast  (e.g.,  trace)  and  LAW  allocation  heuris¬ 
tic  seems  like  a  fertile  ground  for  future  work. 

3.5  Statistical  LAWs 

As  seen  from  the  the  execution  time  plot  (Figure  3.9),  the  GASO  per  VM  workload 
placement  algorithm  runs  an  order  of  magnitude  slower  than  LAW  allocation,  and  as 
such,  is  less  suitable  for  online  workload  placement  tasks,  particularly  for  large  data 
center  networks.  The  GASO  execution  time  is  comparable  to  other  workload  place¬ 
ment  solutions  that  use  genetic  algorithms10,  such  as  [16,28].  Therefore,  although 
per  VM  allocation  may  be  used  to  address  the  problem  of  allocating  a  workload 
when  LAW  infeasibility  is  encountered,  we  argue  that  an  online  solution  that  grace¬ 
fully  relaxes  the  ideal  application  allocation,  as  represented  by  the  LAW  structure,  is 
preferable  in  the  interests  of  both  reduced  RCI  and  reduced  execution  time.  We  also 
considered  an  alternative  LAW  backtracking  approach  to  address  the  issue  of  LAW 
infeasibility,  but  such  an  approach  disrupts  current  allocations  and  initial  evaluations 
significantly  degraded  execution  time,  so  we  defer  the  exploration  of  such  alternative 
options  to  future  work. 

10Greedy  heuristics,  like  those  used  in  [3, 4]  run  faster  (order  of  seconds)  for  placing  large  work¬ 
loads,  but  they  are  prone  to  suboptimal  convergence  to  local  minima  when  multiple  objectives  are 
considered. 
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Therefore,  we  propose  a  Statistical  LAW  allocation  strategy  to  explore  the  LAW  fea¬ 
sibility  vs.  performance  tradeoff.  A  Statistical  LAW  is  natural  LAW  relaxation  which 
offers  a  compromise  between  complete  LAW  and  per  VM  placement  approaches,  by 
representing  a  percentage  of  the  workload  VMs  as  a  “relaxed”  LAW,  and  considering 
the  remainder  of  the  workload  VMs  as  individual  units  of  allocation  for  some  per  VM 
allocation  method. 

3.5.1  Example 

The  70%  Statistical  LAW  model  of  workload  R4  from  the  Section  3.2  example  scenario 
is  illustrated  in  Figure  3.10,  and  represents  the  preservation  of  at  least  70%  of  the 
original  LAW  VMs  (6  VMs)  while  using  per  VM  allocation  for  the  remainder  (2 
VMs).  Figure  3.11  depicts  the  resultant  network  state  after  using  the  “Min  CRC” 
LAW  heuristic  to  allocate  the  70%  LAW  for  R4  (Figure  3.10)  to  the  physical  network 
state  depicted  in  Figure  3.3c. 


Figure  3.10:  Statistical  LAW  (70%)  for  workload  R4  in  example  scenario. 


By  using  a  statistical  measure  to  relax  LAW  VM  positioning  requirements,  a  compro¬ 
mise  between  ideal  workload  allocation  and  feasibility  is  achieved,  while  still  main¬ 
taining  the  fast  LAW  allocation  times  compared  to  traditional  per  VM  approaches. 


3.5.2  Construction  and  Allocation 

An  x%  Statistical  LAW  for  some  original  LAW  L  is  constructed  by  removing 
(100  —  x)%  of  L’s  VMs  (floor)  from  L  by  removing  the  VMs  sequentially,  where  each 
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Figure  3.11:  “Min  CRC”  LAW  allocation  using  a  70%  Statistical  LAW  to  allocate 
previously  infeasible  workload  R4  for  the  example  scenario. 

VM  is  removed  from  the  LAW  host  containing  the  most  VMs  (or  from  a  host  under 
the  most  populated  ToR  in  the  case  of  a  tie).  The  removed  VMs  are  maintained 
with  the  LAW,  and  are  allocated  in  a  per  VM  fashion  after  the  Statistical  LAW  is 
allocated.  For  Statistical  LAW,  the  goal  of  the  per  VM  allocation  heuristic  for  the 
“remainder”  VMs  is  to  minimize  RCI  subject  to  WCS  constraints.  Hence,  for  Cl 
workloads,  the  per  VM  allocations  strive  to  minimize  host  affinity  by  allocating  the 
VMs  to  hosts  with  the  fewest  number  of  Cl  slots  allocated.  For  DI  workloads,  these 
allocations  seek  to  maximize  host  affinity  by  concentrating  the  VMs  as  closely  as 
possible  to  the  rest  of  the  LAW  VMs  without  violating  the  WCS  bound. 


3.5.3  Statistical  LAW  Results 

Here,  we  present  the  results  of  Statistical  LAW  allocation  using  the  same  workload 
traces  and  simulated  physical  infrastructure  as  the  Section  3.4  evaluation.  Statistical 
LAW  allocation  methods  use  a  progressive  backoff  approach.  For  each  workload  in  the 
trace,  first  complete  LAW  allocation  is  attempted,  then  progressively  90%,  70%,  50%, 
and  finally  30%  Statistical  LAW  allocations  are  attempted  if  the  previous  Statistical 
LAW  allocation  attempt  failed  to  feasibly  allocate  the  workload.  For  Statistical  LAW, 
the  workload  is  considered  infeasible  only  if  all  allocation  attempts  (complete,  90%, 
70%,  50%,  30%)  fail.  Of  course,  an  even  lower  Statistical  LAW,  such  as  10%  LAW 
may  be  attempted  if  30%  LAW  allocation  fails,  but  at  that  point  we  observe  the 
resultant  LAW  structure  would  be  too  degraded  to  provide  much  benefit. 
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Figure  3.12:  Effect  of  Statistical  LAW  on  CRC. 
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Figure  3.13:  Effect  of  Statistical  LAW  on  Mean  RCI. 


The  Statistical  LAW  results  as  shown  in  Table  3.2  clearly  demonstrate  the  advan¬ 
tage  of  using  a  Statistical  LAW  approach  vs.  per  VM  allocation  when  complete  LAW 
allocation  is  not  feasible.  Notably,  in  this  scenario,  the  Statistical  LAW  allocation 
performs  similarly  to  complete  LAW  allocation,  with  both  RCI  and  execution  time 
increased  by  only  a  small  constant  factor  as  workloads  become  overwhelmingly  infea¬ 
sible  for  Statistical  LAW,  which  begins  to  occur  around  85%  infrastructure  utilization, 
as  seen  in  Figures  3.13  and  3.14,  respectively.  In  other  words,  for  each  resource  usage 
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Allocation  Heuristic/ 
LAW  Type 

Min  Power 

Min  BW 

Min  CRC 

Complete  LAW 

0.84 

0.61 

0.30 

90%  Stat.  LAW 

0.84 

0.63 

0.35 

70%  Stat.  LAW 

0.86 

0.67 

0.62 

50%  Stat.  LAW 

0.87 

0.80 

0.86 

30%  Stat.  LAW 

0.91 

0.89 

0.90 

Table  3.2:  Statistical  LAW  feasibility  results  for  “Min  Power,”  “Min  BW,”  and  “Min 
CRC”  for  the  large-scale  evaluation.  The  values  denote  the  average  physical  in¬ 
frastructure  utilization  (fraction  of  VM  capacity  allocated)  when  the  corresponding 
allocation  heuristic  and  LAW  type  first  encounter  infeasibility. 

metric,  there  is  very  little  performance  degradation  for  using  Statistical  LAW  com¬ 
pared  to  complete  LAW  allocation,  as  can  be  seen  by  observation  of  plotted  points 
for  Statistical  LAW  allocation  methods  beyond  the  amount  of  physical  infrastructure 
utilization  at  which  complete  LAW  allocation  fails  (values  in  the  first  row  of  Table 
3.2,  denoted  by  dotted  vertical  lines  in  each  figure).  For  instance,  in  Figure  3.12,  al¬ 
though  the  “Min  CRC”  LAW  placement  heuristic  begins  to  incur  slightly  higher  CRC 
than  App- Aware  GASO  at  approximately  40%  infrastructure  utilization,  remains  very 
similar  (within  a  few  percentage  points)  throughout  the  entire  trace. 


3.6  LAWs  for  Workload  Prioritization 

As  demonstrated  in  the  previous  section,  using  Statistical  LAW  allocation  may  greatly 
improve  LAW  allocation  feasibility  versus  complete  LAW  allocation,  while  still  pro¬ 
ducing  resource  and  application  efficient  allocations  with  fast  workload  placement 
times,  ffowever,  because  Statistical  LAW  may  result  in  increased  application  RCI 
for  statistically  allocated  workloads,  the  operator  may  prefer  to  take  a  proactive  ap¬ 
proach  in  applying  Statistical  LAW  allocations  to  low  priority  workloads  in  order  to 
increase  the  LAW  feasibility  of  high  priority  workloads  to  ensure  minimum  RCI  for 
them.  In  this  section,  we  propose  the  Early  Statistical  LAW  allocation  strategy,  a 
proactive  approach  that  trades  off  suboptimal  placement  of  low  priority  workloads  to 
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Figure  3.14:  Effect  of  Statistical  LAW  on  execution  time, 
increase  LAW  feasibility  for  high  priority  workloads. 


3.6.1  Early  Statistical  LAW  Allocation 

In  this  work,  we  explore  a  simple  binary  priority  scheme  and  leave  more  elaborate 
prioritization  to  future  work.  We  seek  to  increase  the  feasibility  of  high  priority 
workloads  by  allocating  low  priority  workloads  using  Early  Statistical  LAW,  which 
differs  from  default  Statistical  LAW  allocation  only  in  the  per  VM  allocation  heuristic 
used  to  allocate  the  non-LAW  “remainder”  VMs.  While  per  VM  allocations  using 
default  Statistical  LAW  strive  to  minimize  RCI,  Early  Statistical  LAW  uses  these  per 
VM  allocations  as  a  “compromise”  to  increase  the  feasibility  for  future  high  priority 
LAWs. 

We  achieve  these  compromise  allocations  by  placing  the  remainder  Cl  VMs  in  the 
infrastructure  subtree  with  the  fewest  available  slots  using  the  classical  “tightest  fit” 
bin  packing  heuristic,  with  a  preference  for  placement  on  hosts  with  more  DI  VMs. 
Intuitively,  using  such  a  heuristic  should  provide  more  “contiguous”  VM  slot  space  in 
other  infrastructure  subtrees,  thus  providing  a  higher  likelihood  of  LAW  feasibility  for 
future  high  priority  workloads.  Because  this  additional  contiguous  slot  space  comes  at 
the  cost  of  bandwidth  efficiency,  we  do  not  compromise  the  placement  of  DI  VMs,  but 
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instead  allocate  them  in  the  same  fashion  as  default  Statistical  LAW:  with  maximum 
intra-application  host  affinity  while  satisfying  WCS  constraints. 

3.6.2  Early  Statistical  LAW  Results 

We  evaluate  Early  Statistical  LAW  using  the  same  scenario  as  before  except  this 
time  we  randomly  assign  each  workload  a  binary  priority  value  (“high”  or  “low”). 
We  compare  the  performance  of  Early  Statistical  LAW  to  default  Statistical  LAW, 
specifically  with  regard  to  the  feasibility  of  high  priority  workloads.  Each  low  priority 
workload  is  allocated  as  a  50%  Statistical  LAW,  using  the  compromise  placement 
heuristic  described  in  the  preceding  paragraph,  while  high  priority  workloads  are 
allocated  identically  to  Statistical  LAW,  using  the  same  progressive  backoff  schedule. 


Allocation  Heuristic  / 
High  Priority  LAW  Type 

Min  Power 

Min  BW 

Min  CRC 

Complete  LAW 

0.95  (14.5%) 

0.63  (5.0%) 

0.43  (43.3%) 

90%  Stat.  LAW 

0.96  (14.3%) 

0.67  (8.1%) 

0.51  (45.7%) 

70%  Stat.  LAW 

0.97  (12.8%) 

0.85  (28.8%) 

0.80  (29.0%) 

50%  Stat.  LAW 

0.98  (12.6%) 

0.89  (11.3%) 

0.97  (12.8%) 

30%  Stat.  LAW 

0.99  (8.8%) 

0.96  (7.9%) 

0.99  (11.2%) 

Table  3.3:  Early  Statistical  LAW  feasibility  results  for  high  priority  workloads  in 
the  large-scale  evaluation.  The  raw  values  denote  the  average  physical  infrastructure 
utilization  when  the  corresponding  allocation  heuristic  and  LAW  type  first  encounter 
high  priority  workload  infeasibility.  The  values  in  parentheses  denote  the  percentage 
of  feasibility  increase  over  default  Statistical  LAW  allocation. 

Table  3.3  illustrates  the  merits  of  using  Early  Statistical  LAW  to  improve  feasibility  for 
high  priority  workloads.  By  proactively  allocating  low  priority  workloads  using  Early 
Statistical  LAW,  high  priority  feasibility  is  increased  for  each  LAW  type  versus  default 
Statistical  LAW  allocation.  Consider  “Min  CRC,”  the  LAW  allocation  heuristic  that 
is  generally  the  least  feasible.  By  using  the  Early  Statistical  LAW  allocation  strategy, 
complete  LAW  feasibility  is  increased  by  43.3%  for  high  priority  workloads,  ffowever, 
while  Early  Statistical  LAW  allocation  leads  to  better  placements  for  high  priority 
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workloads  (i.e. ,  closer  to  desired,  more  likely  to  meet  SLA),  this  increased  feasibility 
comes  at  the  cost  of  increased  RCI,  depicted  in  Figure  3.15. 
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Figure  3.15:  RCI  vs.  feasibility  tradeoff  introduced  by  Early  Statistical  LAW  (using 
Min  CRC  heuristic). 


Regarding  this  feasibility  vs.  RCI  tradeoff,  it  is  clear  that  the  increase  in  RCI  with 
Early  Statistical  LAW  is  a  result  of  suboptimal  allocations  for  low  priority  workloads, 
since  complete  LAW  allocations  preserve  RCI  and  high  priority  LAW  feasibility  is 
increased  using  this  strategy.  This  feasibility  vs.  RCI  tradeoff  should  be  considered 
by  the  operator  when  making  a  determination  to  use  Early  Statistical  LAW.  For 
example,  suppose  an  operator  decides  to  use  the  “Min  CRC”  LAW  placement  heuristic 
based  on  forecasting  evidence  that  more  Cl  workloads  than  DI  workloads  are  likely 
to  arrive.  Since  LAW  placement  using  “Min  CRC”  has  been  shown  to  have  relatively 
low  LAW  feasibility,  he/she  may  want  to  invoke  Early  Statistical  LAW  allocation  for 
low  priority  workloads,  especially  if  there  are  high  priority  tasks  or  tenants  active, 
and  accept  higher  RCI  for  low  priority  workloads  as  a  tradeoff.  On  the  other  hand, 
if  there  are  relatively  few  high  priority  tasks  or  tenants  active,  or  if  the  operator  is 
unsure  of  the  types  of  workloads  likely  to  arrive,  then  he/she  may  decide  to  use  the 
“Min  Power”  LAW  placement  heuristic  with  no  Early  Statistical  LAW  allocations, 
since  “Min  Power”  has  been  shown  to  have  relatively  high  complete  LAW  feasibility 
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as  is,  and  the  relatively  benign  workload  forecast  may  allow  electrical  cost  savings 
while  still  providing  good  RCI  for  active  tasks  and  tenants. 


3.7  Related  Work 

Prior  work  on  application-aware  workload  placement  mostly  focuses  on  maximizing 
the  performance  of  certain  applications  (e.g.,  data  intensive  [20],  high-performance 
computing  applications  [22],  etc.)  or  optimizing  specific  aspects  of  application  perfor¬ 
mance  (e.g.,  network  throughput  [23,26],  fairness  [24]).  Other  related  solutions  such 
as  CloudMirror  [3]  and  Ostro  [4]  provide  bandwidth  guarantees  and  ensure  high  ap¬ 
plication  availability.  To  the  best  of  our  knowledge,  this  work  is  the  first  to  propose 
precomputing  desired  workload  placements  for  individual  applications  (i.e.,  LAWs) 
and  subsequently  using  them  to  speed  up  and  prioritize  workload  placement  while 
meeting  per  application  performance  requirements. 

3.8  Conclusion 

We  have  demonstrated  that  an  application-aware  approach,  by  optimizing  the  relative 
positioning  of  VMs  of  individual  workloads,  can  meet  both  per  application  require¬ 
ments  and  cumulative  resource  usage  goals.  Furthermore,  we  have  proposed  a  new 
abstraction  (i.e.,  LAW)  to  enable  online  workload  placement  that  is  one  order  of 
magnitude  faster  than  existing  solutions,  and  scalable  to  large  data  center  scenarios. 

In  the  big  picture,  we  view  this  work  as  a  first  step  towards  understanding  the  tradeoff 
between  maximizing  application  performance  and  optimizing  network-wide  resource 
usage.  We  believe  there  is  a  wide  design  space  for  new  formulations  and  heuristics  to 
meet  specific  combinations  of  application-level  requirements  and  operational  goals. 
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CHAPTER  4: 
Conclusion 


Here,  we  summarize  the  merits  of  applying  each  of  the  design  principle  individually, 
i.e.,  the  distinct  benefits  of  applying  either  the  EASO  or  LAW  contribution  indepen¬ 
dently,  and  then  propose  a  new  design  strategy  for  combining  these  design  principles 
into  a  synthesized  approach  that  realizes  the  benefits  of  each,  which  we  term  “EASO- 
LAW  Synthesis.”  Finally,  the  dissertation  concludes  with  a  discussion  of  open  issues 
and  future  work. 


4.1  Summary  of  Contributions 

The  first  principle,  comprehensive  tradeoff  exploration,  provides  the  operator  with 
a  wide  range  of  meritorious  placement  options  for  a  given  workload,  but  does  so  at 
the  cost  of  execution  time,  and  without  much  consideration  for  application-specific 
resource  contentions.  The  second  principle,  LAW,  provides  rapid  workload  allocation 
times,  even  for  large  data  center  networks,  lacks  the  tradeoff  exploration  of  the  former, 
thus  requiring  the  operator  to  specify  desirable  LAW  characteristics  (i.e.,  weighted 
objectives)  a  priori. 

Comprehensive  Tradeoff  Exploration.  We  propose  a  novel,  pure  MOP  formu¬ 
lation  for  data  center  orchestration,  and  demonstrate  how  our  new  EASO  algorithm 
can  enumerate  a  wide  range  of,  and  potentially  better,  solutions  than  current  or- 
chestrators  for  relatively  large  data  center  networks  by  exploring  tradeoffs  based  on 
Pareto-dominance,  rather  than  attempting  to  reducing  the  multi-objective  tradeoff 
space  to  a  single-objective  optimization  problem  as  done  by  existing  NCF  resource 
allocation  solutions.  Also,  unlike  existing  NCF  resource  allocation  solutions,  EASO 
provides  a  set  of  proposed  allocations,  each  with  individual  merits,  rather  than  a 
single  “best”  allocation  (which  may  not  necessarily  best  achieve  operator  objectives 
if  NCF  weightings  are  set  improperly  or  if  the  orchestration  algorithm  converges  to  a 
local  minima),  as  demonstrated  in  Chapter  2. 
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LAW-based  Resource  Allocation.  By  explicitly  modeling  application  preferences 
and  developing  a  novel  LAW  abstraction  that  preserves  them,  we  have  demonstrated 
how  to  achieve  application-aware  allocation  for  data  center  workloads  at  time  scales 
that  are  an  order  of  magnitude  faster  than  other  multi-objective  workload  placement 
approaches,  while  still  achieving  high  quality  resource  allocations  in  terms  of  NCF 
and  operator  objectives.  This  is  in  contrast  to  current  per  VM  workload  placement 
approaches  that  use  a  finer  granularity  of  allocation  for  application  components  (e.g., 
VMs).  By  precomputing  larger  atomic  units  of  allocation  (e.g.,  LAWs)  that  represent 
entire  workloads,  we  have  demonstrated  the  feasibility  of  scalably  allocating  such 
LAWs  in  a  manner  that  further  reduces  contention  among  applications  while  yielding 
sub-second  workload  placement  times  for  placing  large  data  center  workloads. 


4.2  EASO-LAW  Synthesis:  Better  Than  the  Sum 
of  Its  Parts 

This  strategy  involves  first  performing  comprehensive  NCF  tradeoff  exploration  using 
EASO  to  precompute  a  wide  range  of  LAWs  desirable  to  the  network  operator,  and 
then,  as  new  tenant  workload  requirements  arrive,  operator  preferences  and  network 
conditions  are  queried  to  select  the  ideal  LAW  and  corresponding  heuristic  for  work¬ 
load  allocation.  By  implementing  such  a  design  strategy,  which  should  be  relatively 
straightforward  given  the  technical  contributions  of  Chapters  2  and  3,  and  the  content 
of  this  section,  an  orchestration  solution  that  both  performs  comprehensive  tradeoff 
exploration  and  achieves  scalable,  resource  efficient,  and  application-aware  placement 
of  large-scale  data  center  workloads,  can  be  realized. 

The  key  insight  in  building  an  effective  data  center  orchestration  solution  using  com¬ 
prehensive  tradeoff  exploration  (as  implemented  by  EASO)  and  LAW  lies  in  under¬ 
standing  how  to  synthesize  these  two  conceptually  distinct  principles  into  one,  such 
that  the  resultant  solution  enjoys  both  the  tradeoff  exploration  of  EASO,  and  the 
rapid  and  application-aware  workload  allocation  of  LAW.  To  deepen  the  understand¬ 
ing  of  how  this  may  be  achieved,  consider  Figure  4.1,  which  depicts  the  system  flow 
of  (a)  standalone  EASO,  (b)  standalone  LAW,  and  (c)  EASO-LAW  Synthesis. 
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(a)  EASO 


(b)  LAW 


(c)  EASO-LAW 


Figure  4.1:  System  flow  diagrams  for  (a)  EASO,  (b)  LAW,  and  (c)  EASO-LAW  Syn¬ 
thesis.  Steps  highlighted  in  red  are  computationally  intensive  (order  of  tens  of  seconds 
or  more),  those  contained  in  blue  are  part  of  the  typical  process  flow,  and  those  in 
green  provide  tradeoff  exploration.  EASO-LAW  Synthesis  (c)  provides  tradeoff  ex¬ 
ploration  while  avoiding  time  intensive  computation  in  the  typical  process  flow. 


Each  of  the  above  diagrams  represents  the  execution  process  for  its  corresponding 
system.  Colors  are  used  to  highlight  key  steps  in  the  execution  process.  Steps  con¬ 
tained  in  the  contiguous  blue  rectangle  are  considered  part  of  the  typical  process 
flow  for  the  system  and  will  be  executed  very  frequently.  These  steps  describe  the 
system  execution  after  initialization,  when  it  is  ready  and  waiting  to  process  new 
tenant  workload  requirements.  Steps  highlighted  require  time  intensive  computation, 
typically  on  the  order  of  tens  of  seconds  or  more.  These  steps  are  associated  with 
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the  computation  of  desirable  resource  efficient  allocations  or  sets  of  allocations  in  the 
case  of  tradeoff  enumeration.  Steps  highlighted  in  green  represent  the  comprehensive 
exploration  of  NCF  tradeoffs,  a  desirable  feature  of  an  orchestration  program,  and 
our  first  fundamental  design  principle. 

By  using  these  colors  to  highlight  the  corresponding  aspects  of  each  of  the  systems 
depicted  in  Figure  4.1,  the  differences  between  the  three  designs,  and  the  comple¬ 
mentary  nature  of  EASO  and  LAW  are  made  clear.  EASO  provides  comprehensive 
tradeoff  exploration,  thus  enabling  a  variety  of  proposed  allocations  to  be  instantiated 
on  demand  according  to  changing  network  conditions  and  operator  objectives  (e.g., 
NCF  weightings),  but  it  performs  this  relatively  expensive  computation  within  its 
typical  process  flow  each  time  a  new  workload  arrives.  This  expensive  computation 
may  hinder  the  ability  of  EASO  to  rapidly  allocate  tenant  workloads  during  periods 
of  high  tenant  demand.  In  contrast,  the  typical  process  flow  of  LAW  contains  no 
computationally  expensive  steps,  as  the  vast  majority  of  tenant  workloads  are  pre¬ 
computed,  thus  permitting  the  fast  allocation  times  necessary  to  keep  up  with  high 
tenant  demand.  However,  because  LAW  does  not  accomplish  tradeoff  exploration,  the 
operator  needs  to  specify  NCF  weights  a  priori  so  that  desirable  LAWs  may  be  pre¬ 
computed.  But  these  weights  may  change  due  to  varying  network  conditions,  tenant 
demand  level,  and  SLA,  requiring  LAWs  to  be  re-precomputed  for  the  new  weightings. 
Thus,  LAW  may  be  unsuitable  for  use  in  dynamic  environments  requiring  frequent 
re-precomputation  of  LAWs. 

EASO-LAW  Synthesis  provides  the  benefits  of  both  EASO  and  LAW,  with 
the  drawbacks  of  neither.  With  EASO-LAW  Synthesis,  we  use  EASO  in  the  pre- 
computation  step  (Step  1  of  Figure  4.1(c)),  to  generate  a  set  of  desirable  tradeoff 
LAWs  for  each  of  the  vast  majority  of  commonly  encountered  workloads.  So  in  con¬ 
trast  to  standalone  LAW,  which  only  stores  a  single  precomputed  LAW  for  each  tenant 
workload  requirement  tuple,  EASO-LAW  Synthesis  stores  a  set  of  precomputed  trade¬ 
off  LAWs  for  each  workload.  Hence,  EASO-LAW  Synthesis  shares  the  comprehensive 
tradeoff  exploration  benefit  of  EASO,  allowing  the  operator  to  dynamically  accom¬ 
modate  changing  objectives  or  network  conditions  on  demand,  without  the  need  for 
LAW  re-precomputation.  Moreover,  EASO-LAW  Synthesis  also  offers  the  speed  ad- 
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vantage  of  LAW,  as  LAW  tradeoff  precomputation  is  performed  in  advance  of  the 
typical  process  flow,  which  is  free  of  steps  involving  time  intensive  computation. 

Observation  1:  EASO  and  LAW  are  complementary,  and  enable  better 
orchestration  when  used  together.  By  combining  EASO  and  LAW  as  described 
above,  we  can  achieve  a  synthesized  data  center  orchestration  solution  that  yields  a  net 
benefit  greater  than  the  sum  of  each  of  its  parts  when  considered  individually,  because 
each  compensates  for  the  drawbacks  of  the  other.  LAW  construction  and  allocation 
algorithms  permit  sub-second  placements  of  large  application  workloads  in  large-scale 
data  center  networks,  but  by  themselves,  only  a  single  application-aware  allocation 
is  considered.  Complementary  to  LAW,  EASO  enables  comprehensive  exploration  of 
the  multi-NCF  tradeoff  space,  but  does  not  explicitly  preserve  application  preferences, 
and  the  quality  of  the  resultant  nondominated  solution  set  is  limited  by  the  “time 
budget”  of  the  data  center  operator  (i.e.,  the  amount  of  time  the  operator  can  afford 
to  let  EASO  run).  However,  when  used  together  as  a  single  orchestration  solution, 
a  set  of  desirable  tradeoffs  for  common  or  existing  application  workloads  can  be 
discovered  by  EASO  and  stored  as  precomputed  LAWs,  thus  enabling  the  fast  and 
application-aware  allocation  of  workloads  corresponding  to  the  on-demand  tradeoff 
selection  preferences  of  the  data  center  operator,  which  are  likely  to  change  frequently 
due  to  dynamic  network  conditions  and  varying  objective  priorities. 

Observation  2:  EASO-LAW  Synthesis  enables  efficient  scaling  existing 
workloads.  A  common  and  effective  approach  for  scaling  existing  application  work¬ 
loads  as  end  user  demand  increases  is  to  simply  allocate  additional  instances  of  the 
workload  in  demand,  as  done  in  the  Network  Function  Center  work  [16].  Because 
EASO-LAW  Synthesis  archives  previously  encountered  workloads,  allocating  addi¬ 
tional  instances  of  in-demand  workloads  can  be  achieved  in  fractions  of  a  second 
using  LAW  allocation,  enabling  efficient  workload  scaling,  even  for  relatively  large 
workloads  and  physical  topologies.  Furthermore,  if  network  conditions  or  operator 
priorities  change  during  the  life  of  a  workload,  because  EASO  has  precomputed  LAWs 
for  a  wide-range  of  desirable  tradeoffs,  the  scaling  of  the  workload  (i.e.,  the  placement 
of  additional  workload  instances  can  be  adjusted  seamlessly  to  accommodate  the  new 
preferences  and  requirements  of  the  operator. 
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Observation  3:  Accurate  application  forecasting  enables  new  workloads 
to  be  placed  quickly  and  efficiently  using  EASO-LAW  Synthesis.  Although 
execution  for  thorough  EASO  tradeoff  exploration  (i.e.,  “long  run”)  may  be  too  slow 
for  dynamic  large-scale  data  center  environments,  if  the  operator  can  forecast  the 
types  of  workloads  that  are  due  to  arrive  in  advance,  he/she  can  run  EASO  prior  to 
their  arrival  and  precompute  the  efficient  LAW  tradeoffs  in  advance,  thus  allowing 
for  fast  allocation  times  when  they  arrive.  In  many  ways,  the  challenges  associated 
with  achieving  fast  and  efficient  allocations  using  EASO-LAW  Synthesis  are  similar 
to  those  associated  with  developing  an  effective  cache  “hit  ratio”.  Developing  and 
implementing  strategies  for  minimizing  real-time  EASO  execution  cycles  seems  to  be 
a  fertile  ground  for  future  work. 

4.3  Future  Work 

In  this  section,  we  identify  the  limitations  of  onr  contributions,  and  then  go  on  to 
outline  the  next  steps  in  this  research  to  address  these  limitations.  Next,  we  iden¬ 
tify  additional  research  areas  that  seem  a  natural  fit  for  our  EASO-LAW  Synthesis 
approach,  and  conclude  the  dissertation. 

Overcoming  limitations  of  EASO.  EASO  has  two  limitations.  First,  for  a  given 
tenant  application  workload  requirement,  it  requires  the  operator  to  select  a  tradeoff 
from  the  nondominated  set  of  candidate  proposals.  Due  to  the  potentially  vast  rnulti- 
NCF  tradeoff  space  (e.g.,  843  different  nondominated  candidate  solutions  for  the 
large-scale  workload  presented  in  in  Section  2.5.3),  requiring  a  human  to  examine 
each  prospective  tradeoff  individually,  without  some  degree  of  automated  assistance, 
may  require  more  time  and  resources  than  EASO  uses  to  generate  the  tradeoffs  in 
the  first  place.  Hence,  for  tradeoff  exploration  to  be  more  useful  in  practice,  decision 
support  systems  and  machine  learning  approaches  could  be  used  to  reduce  the  number 
of  tradeoffs  available  to  the  operator  by  automatically  discarding  those  that  are  least 
likely  to  achieve  operator  goals  while  meeting  tenant  SLA,  given  current  or  forecasted 
network  conditions. 

Second,  EASO  tradeoff  exploration  may  be  time  consuming.  Depending  on  the  work¬ 
load  size,  number  of  applications,  size  of  physical  topology,  and  the  quality  and  di- 
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versity  of  candidate  proposals  desired  by  the  operator,  EASO  tradeoff  exploration 
can  take  from  tens  of  seconds  to  hundreds  of  minutes,  or  longer.  However,  an  im¬ 
plementation  of  our  proposal  for  EASO-LAW  Synthesis  should  effectively  overcome 
this  limitation  by  precomputing  a  wide  range  of  LAWs  for  various  tenant  workload 
specifications  with  respect  to  the  multi-NCF  tradeoff  space. 

Generalizing  LAW  Construction.  In  this  work,  LAWs  are  constructed  to  repre¬ 
sent  a  single  application  (or  application-tier).  However,  complex  applications  com¬ 
prised  of  heterogeneous  “tiers,”  or  sub-applications,  such  as  the  200  VM,  5-tier  tenant 
workload  described  in  Section  2.5.3,  are  commonly  found  in  practice  [3,4,25].  There¬ 
fore,  evaluating  LAW  construction  (precomputation)  and  allocation  for  such  complex 
multi-tier  workloads  in  a  real-world  data  center  environment  is  a  critical  next  step 
towards  realizing  the  deployment  of  a  LAW-based  orchestration  system.  Also,  a  more 
granular  resource  contention  classifier  is  desirable  for  multi-tier  applications,  since 
each  individual  application  tier  may  have  different  resource  contention  characteris¬ 
tics,  and  the  interactions  between  application  tiers  may  be  complex.  For  example, 
a  recent  study  [13]  demonstrated  that  resource  contention  for  a  large-scale  Hadoop 
TeraSort  workload  varied  based  on  the  number  of  clusters  (i.e. ,  groups  of  VMs)  used  to 
execute  the  task.  When  two  4-VM  clusters  were  used,  the  workload  was  bottlenecked 
by  storage  and  network  (DI)  contention.  However,  when  four  4-VM  clusters  were 
used  to  process  the  the  same  task,  it  was  CPU  contention  that  limited  throughput. 

Considering  these  challenges,  we  believe  our  proposed  approach  for  EASO-LAW  Syn¬ 
thesis  is  well  positioned  to  handle  the  placement  of  complex  multi-tier  applications. 
We  demonstrated  in  Section  2.5.3  the  capability  of  EASO  to  generate  a  diverse  set 
of  candidate  proposals  for  allocating  complex  multi-tier  applications,  such  that  each 
proposal  satisfies  tenant  SLA  requirements  and  is  nondominated  with  respect  to  the 
multi-NCF  tradeoff  space  in  terms  of  operator  objectives.  By  retaining  the  intra¬ 
application  tier  labels  for  each  VM  in  an  EASO  proposal,  a  LAW  may  be  constructed 
consisting  of  VMs  from  heterogenous  application  tiers,  as  depicted  in  Figure  4.2.  And 
because  LAW  allocation  preserves  relative  positioning  of  intra-application  VMs  within 
the  physical  infrastructure,  Algorithm  3.3  can  be  used  for  placing  such  heterogeneous 
multi-tier  LAWs  without  modification  to  achieve  application-efficient  workload  place- 
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merits  that  meet  tenant-operator  SLAs.  Statistical  LAW  allocation,  however,  becomes 
more  challenging,  since  per  VM  allocation  of  certain  application  tiers  may  result  in 
breach  of  SLA,  but  not  for  others.  Thus,  a  promising  area  for  future  work  for  al¬ 
locating  heterogenous  multi-tier  LAWs  involves  understanding  and  representing  the 
criticality  of  each  intra-application  tier,  and  customizing  the  Statistical  LAW  process 
for  VM  removal  and  allocation  to  meet  the  requirements  of  each  tier  with  respect  to 


Figure  4.2:  Heterogeneous  multi-tier  LAW  constructed  from  a  proposal  generated  by 
EASO  for  allocating  the  200  VM,  5-tier  (40  VMs  per  tier)  tenant  workload  described 
in  Section  2.5.3.  VMs  of  tier  T1  are  represented  by  green  ovals,  tier  T2  red  ovals,  tier 
T3  brown  ovals,  tier  T4  purple  ovals,  and  tier  T5  blue  ovals.  For  brevity,  logical  host 
allocations  for  each  tier  are  only  shown  for  those  under  the  first  logical  ToR  (vTl). 
For  the  other  logical  ToRs  (vT2-vT4),  the  number  of  VMs  of  each  respective  tier 
allocated  to  logical  hosts  under  underneath  them  is  represented  by  the  numbers  in 
the  corresponding  ovals. 


Infeasibility  is  another  limitation  of  LAW-based  orchestration  approaches.  Clearly,  if 
it  is  infeasible  to  allocate  a  LAW  for  some  tenant  workload,  it  becomes  more  chal¬ 
lenging  to  guarantee  that  the  tenant-operator  SLA  is  satisfied  by  some  workload 
allocation.  And  although  our  initial  Statistical  LAW  evaluations  presented  in  Sec¬ 
tion  3.5.3  demonstrate  marginal  impact  on  intra-application  contention  (e.g.,  RCI), 
determining  whether  or  not  the  sub-optimality  of  some  Statistical  LAW  placement 
constitutes  a  breach  of  SLA,  is  an  open  issue. 
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Of  course,  one  way  of  combating  the  potential  shortfalls  of  Statistical  LAW  allocation 
is  to  simply  increase  complete  LAW  feasibility.  Thus,  exploring  alternative  ways  to 
increase  complete  LAW  feasibility  is  yet  another  fertile  area  for  future  work.  For 
instance,  an  offline  approach  to  rearrange  previously  allocated  LAWs  in  order  to 
maximize  the  chance  of  feasibly  allocating  the  next  workload  seems  promising.  Even 
better  would  be  a  method  of  precomputing  such  rearrangements  to  best  accommodate 
a  range  of  LAWs  of  different  types  and  sizes. 

Real  world  data  center  deployment.  In  this  dissertation,  our  EASO  and  LAW 
workload  allocation  approaches  are  evaluated  using  simulated  data  center  topologies 
and  randomly  generated  workload  traces.  Although  the  range  of  parameters  we  use 
to  define  the  topologies  and  generate  the  workload  traces  are  comparable  to  existing 
work,  our  algorithms  need  to  be  implemented  and  evaluated  on  real  data  center 
networks  and  compared  side  by  side  against  other  solutions,  such  as  Corybantic  [10] , 
Athens  [11],  Ostro  [4],  or  CloudMirror  [3],  for  allocating  real  tenant  workloads  with 
real  SLAs,  in  order  to  validate  our  approaches  in  practice. 


Emerging  opportunities.  In  addition  to  the  above,  we  see  opportunities  to  apply 
our  work  to  two  other  emerging  network  research  areas:  1)  automated  NCF  scaling, 
and  2)  network  service  function  chain  (SFC)  provisioning  of  virtual  network  functions 
(VNFs)  for  general  network  topologies. 

1.  Automatically  adjusting  the  number  of  application  components  of  workloads  (i.e., 
auto-scaling)  based  on  forecast  user  demand  seems  to  be  another  promising  area 
for  applying  our  approach.  Auto-scaling  essentially  adds  another  dimension  to  the 
tradeoff  space  by  considering  the  number  of  workload  VMs  as  a  variable,  rather 
than  a  hard  constraint.  Tradeoffs  involving  more  or  less  VMs  are  interesting  from 
an  SLA  perspective  as  tenants  ideally  would  like  to  achieve  maximum  performance 
with  the  fewest  number  of  VMs,  whereas  cloud  providers  may  want  to  find  oppor¬ 
tunities  to  sell  more  VMs  to  tenants  with  the  promise  of  increased  capability. 

2.  SFCs  are  similar  to  application  workloads  in  that  they  can  be  represented  logically 
in  terms  of  required  resources,  but  SFCs  are  more  complex  as  the  VNFs  that 
comprise  them  (i.e.,  application  VMs  that  perform  specific  network  functions,  such 
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as  stateful  firewall,  traffic  load  balancing,  or  web  caching)  not  only  require  strict 
component  positioning  requirements  like  LAWs,  but  also  require  the  SFC  to  send 
data  through  the  VNFs  in  a  specific  order  to  preserve  network  policy.  This  is 
equivalent  to  establishing  a  partial  or  total  ordering  of  individual  VMs  within  a 
LAW,  and  then  evaluating  resource  usage  and  application  contentions  subject  to 
the  data  flow  defined  by  it. 

It  is  an  exciting  time  to  study  the  orchestration  of  data  center  networks  and  the  diverse 
sets  of  applications  hosted  within  them.  These  networks  are  becoming  ever  more 
important  to  the  mission  sets  of  governments  and  corporations,  as  well  as  the  daily 
routines  of  individuals.  The  size  and  diversity  of  cloud  services  and  the  data  centers 
that  provide  them  seem  poised  to  grow  at  unprecedented  rates  in  the  years  to  come. 
Future  data  center  orchestration  approaches  will  need  to  continue  to  scale  to 
even  larger  networks  and  application  workloads,  while  remaining  flexible  and 
extensible  enough  to  handle  a  wide  range  of  emerging  operator  objectives  and 
network  environments. 
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